Authors:Adam Coscia, Alex Endert
Abstract:
Intelligence analysts perform sensemaking over collections of documents using various visual and analytic techniques to gain insights from large amounts of text. As data scales grow, our work explores how to leverage two AI technologies, large language models (LLMs) and knowledge graphs (KGs), in a visual text analysis tool, enhancing sensemaking and helping analysts keep pace. Collaborating with intelligence community experts, we developed a visual analytics system called VisPile. VisPile integrates an LLM and a KG into various UI functions that assist analysts in grouping documents into piles, performing sensemaking tasks like summarization and relationship mapping on piles, and validating LLM- and KG-generated evidence. Our paper describes the tool, as well as feedback received from six professional intelligence analysts that used VisPile to analyze a text document corpus.
Authors:Thomas C. Smits, Nikolay Akhmetov, Tiffany S. Liaw, Mark S. Keller, Eric Mörth, Nils Gehlenborg
Abstract:
Summary: Cell population plots are visualizations showing cell population distributions in biological samples with single-cell data, traditionally shown with stacked bar charts. Here, we address issues with this approach, particularly its limited scalability with increasing number of cell types and samples, and present scellop, a novel interactive cell population viewer combining visual encodings optimized for common user tasks in studying populations of cells across samples or conditions. Availability and Implementation: Scellop is available under the MIT licence at https://github.com/hms-dbmi/scellop, and is available on PyPI (https://pypi.org/project/cellpop/) and NPM (https://www.npmjs.com/package/cellpop). A demo is available at https://scellop.netlify.app/.
Authors:Mukilan Karuppasamy, Shankar Gangisetty, Shyam Nandan Rai, Carlo Masone, C V Jawahar
Abstract:
Autonomous driving (AD) systems are becoming increasingly capable of handling complex tasks, mainly due to recent advances in deep learning and AI. As interactions between autonomous systems and humans increase, the interpretability of decision-making processes in driving systems becomes increasingly crucial for ensuring safe driving operations. Successful human-machine interaction requires understanding the underlying representations of the environment and the driving task, which remains a significant challenge in deep learning-based systems. To address this, we introduce the task of interpretability in maneuver prediction before they occur for driver safety, i.e., driver intent prediction (DIP), which plays a critical role in AD systems. To foster research in interpretable DIP, we curate the eXplainable Driving Action Anticipation Dataset (DAAD-X), a new multimodal, ego-centric video dataset to provide hierarchical, high-level textual explanations as causal reasoning for the driver's decisions. These explanations are derived from both the driver's eye-gaze and the ego-vehicle's perspective. Next, we propose Video Concept Bottleneck Model (VCBM), a framework that generates spatio-temporally coherent explanations inherently, without relying on post-hoc techniques. Finally, through extensive evaluations of the proposed VCBM on the DAAD-X dataset, we demonstrate that transformer-based models exhibit greater interpretability than conventional CNN-based models. Additionally, we introduce a multilabel t-SNE visualization technique to illustrate the disentanglement and causal correlation among multiple explanations. Our data, code and models are available at: https://mukil07.github.io/VCBM.github.io/
Authors:Siqi Zhu, David Zhang, Pedro Cisneros-Velarde, Jiaxuan You
Abstract:
Large Language Models (LLMs) have achieved remarkable progress in reasoning, yet sometimes produce responses that are suboptimal for users in tasks such as writing, information seeking, or providing practical guidance. Conventional alignment practices typically assume that maximizing model reward also maximizes user welfare, but this assumption frequently fails in practice: models may over-clarify or generate overly verbose reasoning when users prefer concise answers. Such behaviors resemble the prisoner's dilemma, where individually rational choices lead to socially suboptimal outcomes. The fundamental challenge is the lack of a principled decision making mechanism that mutually benefits both the LLM and the user. We propose Game-Theoretic Alignment (GTAlign), an alignment framework that integrates game-theoretic decision making into both reasoning and training. During reasoning, the model explicitly treats user-LLM interaction as a strategic game: it constructs payoff matrices within its reasoning chain to estimate welfare for both itself and the user, and then selects actions that are mutually beneficial. During training, we introduce a mutual welfare reward that reinforces cooperative responses, aligning model behavior with socially efficient outcomes. In addition, we introduce an inference technique that leverages game-theoretic reasoning to dynamically adapt LLM's response when pricing policies of LLM service change. Extensive experiments demonstrate that GTAlign substantially improves reasoning efficiency, answer quality, and mutual welfare compared to baselines across diverse tasks. The code is available at https://github.com/ulab-uiuc/GTAlign .
Authors:Hans G. W. van Dam
Abstract:
Advances in large language models (LLMs) and real-time speech recognition now make it possible to issue any graphical user interface (GUI) action through natural language and receive the corresponding system response directly through the GUI. Most production applications were never designed with speech in mind. This article provides a concrete architecture that enables GUIs to interface with LLM-based speech-enabled assistants. The architecture makes an application's navigation graph and semantics available through the Model Context Protocol (MCP). The ViewModel, part of the MVVM (Model-View-ViewModel) pattern, exposes the application's capabilities to the assistant by supplying both tools applicable to a currently visible view and application-global tools extracted from the GUI tree router. This architecture facilitates full voice accessibility while ensuring reliable alignment between spoken input and the visual interface, accompanied by consistent feedback across modalities. It future-proofs apps for upcoming OS super assistants that employ computer use agents (CUAs) and natively consume MCP if an application provides it. To address concerns about privacy and data security, the practical effectiveness of locally deployable, open-weight LLMs for speech-enabled multimodal UIs is evaluated. Findings suggest that recent smaller open-weight models approach the performance of leading proprietary models in overall accuracy and require enterprise-grade hardware for fast responsiveness. A demo implementation of the proposed architecture can be found at https://github.com/hansvdam/langbar
中文: 大型语言模型和实时语音识别的进步使得通过自然语言控制图形用户界面成为可能,本文提出的架构利用模型上下文协议暴露应用程序功能,实现全语音交互,并通过本地部署开源模型保障隐私安全。
English: Recent advances in LLMs and speech recognition enable natural language control of GUI actions, with a proposed architecture using the Model Context Protocol to expose application capabilities for voice accessibility while maintaining privacy through local deployment of open-weight models.
Authors:João Palmeiro, Diogo Duarte, Rita Costa, Pedro Bizarro
Abstract:
AI models are increasingly used for data analysis and visualization, yet benchmarks rarely address scatterplot-specific tasks, limiting insight into performance. To address this gap for one of the most common chart types, we introduce a synthetic, annotated dataset of over 18,000 scatterplots from six data generators and 17 chart designs, and a benchmark based on it. We evaluate proprietary models from OpenAI and Google using N-shot prompting on five distinct tasks derived from annotations of cluster bounding boxes, their center coordinates, and outlier coordinates. OpenAI models and Gemini 2.5 Flash, especially when prompted with examples, are viable options for counting clusters and, in Flash's case, outliers (90%+ Accuracy). However, the results for localization-related tasks are unsatisfactory: Precision and Recall are near or below 50%, except for Flash in outlier identification (65.01%). Furthermore, the impact of chart design on performance appears to be a secondary factor, but it is advisable to avoid scatterplots with wide aspect ratios (16:9 and 21:9) or those colored randomly. Supplementary materials are available at https://github.com/feedzai/biy-paper.
中文摘要:本研究针对散点图任务提出了一个评估AI模型的基准,发现尽管OpenAI和Gemini模型在聚类计数和异常值检测方面表现良好,但在定位相关任务上表现欠佳。
English Summary: This study introduces a benchmark for evaluating AI models on scatterplot tasks, finding that while OpenAI and Gemini models perform well in cluster counting and outlier detection, they struggle significantly with localization tasks.
Authors:Motoki Sato, Yuki Matsushita, Hidekazu Takahashi, Tomoaki Kakazu, Sou Nagata, Mizuho Ohnuma, Atsushi Yoshikawa, Masayuki Yamamura
Abstract:
Patients awaiting invasive procedures often have unanswered pre-procedural questions; however, time-pressured workflows and privacy constraints limit personalized counseling. We present LENOHA (Low Energy, No Hallucination, Leave No One Behind Architecture), a safety-first, local-first system that routes inputs with a high-precision sentence-transformer classifier and returns verbatim answers from a clinician-curated FAQ for clinical queries, eliminating free-text generation in the clinical path. We evaluated two domains (tooth extraction and gastroscopy) using expert-reviewed validation sets (n=400/domain) for thresholding and independent test sets (n=200/domain). Among the four encoders, E5-large-instruct (560M) achieved an overall accuracy of 0.983 (95% CI 0.964-0.991), AUC 0.996, and seven total errors, which were statistically indistinguishable from GPT-4o on this task; Gemini made no errors on this test set. Energy logging shows that the non-generative clinical path consumes ~1.0 mWh per input versus ~168 mWh per small-talk reply from a local 8B SLM, a ~170x difference, while maintaining ~0.10 s latency on a single on-prem GPU. These results indicate that near-frontier discrimination and generation-induced errors are structurally avoided in the clinical path by returning vetted FAQ answers verbatim, supporting privacy, sustainability, and equitable deployment in bandwidth-limited environments.
中文: LENOHA系统通过高精度分类器从临床医生整理的常见问题中检索原文答案,以接近完美的准确率解决患者术前疑问,同时大幅降低能耗并规避生成式AI错误。
English: The LENOHA system addresses pre-procedural patient inquiries by using a high-precision classifier to retrieve verbatim answers from clinician-curated FAQs, achieving near-perfect accuracy while consuming minimal energy and avoiding generative AI errors.
Authors:Ricardo Gonzalez Penuela, Felipe Arias-Russi, Victor Capriles
Abstract:
Multimodal large language models (MLLMs) have been integrated into visual interpretation applications to support Blind and Low Vision (BLV) users because of their accuracy and ability to provide rich, human-like interpretations. However, these applications often default to comprehensive, lengthy descriptions regardless of context. This leads to inefficient exchanges, as users must go through irrelevant details rather than receiving the specific information they are likely to seek. To deliver more contextually-relevant information, we developed a system that draws on historical BLV users questions. When given an image, our system identifies similar past visual contexts from the VizWiz-LF dataset and uses the associated questions to guide the MLLM generate descriptions more relevant to BLV users. An evaluation with three human labelers who revised 92 context-aware and context-free descriptions showed that context-aware descriptions anticipated and answered users' questions in 76.1% of cases (70 out of 92) and were preferred in 54.4% of comparisons (50 out of 92). Our paper reviews, and data analysis are publicly available in a Github repository at https://github.com/rgonzalezp/guiding-multimodal-large-language-models-with-blind-and-low-vision-people-visual-questions .
中文: 研究人员开发了一种系统,利用盲人和低视力用户的历史提问引导多模态大语言模型生成更贴合情境的图像描述,相比通用描述显著提升了信息相关性和用户偏好。
English: Researchers developed a system that uses historical BLV user questions to guide multimodal large language models in generating more contextually relevant image descriptions, significantly improving relevance and user preference over generic descriptions.
Authors:Yanzhe Chen, Kevin Qinghong Lin, Mike Zheng Shou
Abstract:
While recent generative models advance pixel-space video synthesis, they remain limited in producing professional educational videos, which demand disciplinary knowledge, precise visual structures, and coherent transitions, limiting their applicability in educational scenarios. Intuitively, such requirements are better addressed through the manipulation of a renderable environment, which can be explicitly controlled via logical commands (e.g., code). In this work, we propose Code2Video, a code-centric agent framework for generating educational videos via executable Python code. The framework comprises three collaborative agents: (i) Planner, which structures lecture content into temporally coherent flows and prepares corresponding visual assets; (ii) Coder, which converts structured instructions into executable Python codes while incorporating scope-guided auto-fix to enhance efficiency; and (iii) Critic, which leverages vision-language models (VLM) with visual anchor prompts to refine spatial layout and ensure clarity. To support systematic evaluation, we build MMMC, a benchmark of professionally produced, discipline-specific educational videos. We evaluate MMMC across diverse dimensions, including VLM-as-a-Judge aesthetic scores, code efficiency, and particularly, TeachQuiz, a novel end-to-end metric that quantifies how well a VLM, after unlearning, can recover knowledge by watching the generated videos. Our results demonstrate the potential of Code2Video as a scalable, interpretable, and controllable approach, achieving 40% improvement over direct code generation and producing videos comparable to human-crafted tutorials. The code and datasets are available at https://github.com/showlab/Code2Video.
中文摘要:Code2Video是一个基于代码的框架,通过三个协作智能体将教学指令转化为可执行的Python代码来生成教育视频,在质量和效率上相比传统方法实现显著提升。
English Summary: Code2Video is a code-driven framework that uses collaborative agents to generate educational videos through executable Python code, achieving significant improvements in quality and efficiency over traditional methods.
Authors:Satoshi Hashiguchi, Yuta Kataoka, Asako Kimura, Shohei Mori
Abstract:
Mediated reality, where augmented reality (AR) and diminished reality (DR) meet, enables visual modifications to real-world objects. A physical object with a mediated reality visual change retains its original physical properties. However, it is perceived differently from the original when interacted with. We present such a mediated reality object, a stick with different lengths or a stick with a missing portion in the middle, to investigate how users perceive its weight and center of gravity. We conducted two user studies (N=10), each of which consisted of two substudies. We found that the length of mediated reality sticks influences the perceived weight. A longer stick is perceived as lighter, and vice versa. The stick with a missing portion tends to be recognized as one continuous stick. Thus, its weight and center of gravity (COG) remain the same. We formulated the relationship between inertia based on the reported COG and perceived weight in the context of dynamic touch.
Authors:Suli Wang, Yangshen Deng, Zhenghua Bao, Xinyu Zhan, Yiqun Duan
Abstract:
Large-scale foundation models for EEG signals offer a promising path to generalizable brain-computer interface (BCI) applications, but they often suffer from misalignment between pretraining objectives and downstream tasks, as well as significant cross-subject distribution shifts. This paper addresses these challenges by introducing a two-stage alignment strategy that bridges the gap between generic pretraining and specific EEG decoding tasks. First, we propose NeuroTTT: a domain-specific self-supervised fine-tuning paradigm that augments the foundation model with task-relevant self-supervised objectives, aligning latent representations to important spectral, spatial, and temporal EEG features without requiring additional labeled data. Second, we incorporate test-time training (TTT) at inference, we perform (i) self-supervised test-time training on individual unlabeled test samples and (ii) prediction entropy minimization (Tent), which updates only normalization statistics to continually calibrate the model to each new input on the fly. Our approach, which, to our knowledge, is the first to unify domain-tuned self-supervision with test-time training in large-scale EEG foundation models, yields substantially improved robustness and accuracy across diverse BCI tasks (imagined speech, stress detection, motor imagery). Using CBraMod and LaBraM as backbones, our method pushes their performance to a markedly higher level. Results on three diverse tasks demonstrate that the proposed alignment strategy achieves state-of-the-art performance, outperforming conventional fine-tuning and adaptation methods. Our code is available at https://github.com/wsl2000/NeuroTTT.
中文: 本文提出一种两阶段对齐策略,通过领域特定的自监督微调和测试时训练相结合,解决了脑电基础模型中预训练目标与下游任务不匹配及跨被试分布差异的问题,显著提升了多种脑机接口任务的性能。
English: This paper introduces a two-stage alignment strategy combining domain-specific self-supervised fine-tuning and test-time training to enhance EEG foundation models' performance across various BCI tasks by addressing pretraining-task misalignment and cross-subject distribution shifts.
Authors:S. Sandra Bae, Takanori Fujiwara, Danielle Albers Szafir, Ellen Yi-Luen Do, Michael L. Rivera
Abstract:
Producing interactive 3D printed objects currently requires laborious 3D design and post-instrumentation with off-the-shelf electronics. Multi-material 3D printing using conductive PLA presents opportunities to mitigate these challenges. We present a computational design pipeline that embeds multiple capacitive touchpoints into any 3D model that has a closed mesh without self-intersection. With our pipeline, users define touchpoints on the 3D object's surface to indicate interactive regions. Our pipeline then automatically generates a conductive path to connect the touch regions. This path is optimized to output unique resistor-capacitor delays when each region is touched, resulting in all regions being able to be sensed through a double-wire or single-wire connection. We illustrate our approach's utility with five computational and sensing performance evaluations (achieving 93.35% mean accuracy for single-wire) and six application examples. Our sensing technique supports existing uses (e.g., prototyping) and highlights the growing promise to produce interactive devices entirely with 3D printing. Project website: https://github.com/d-rep-lab/3dp-singlewire-sensing
中文: 本研究提出一种计算设计流程,可在三维模型中嵌入电容式触摸点,通过优化导电路径和单线高精度传感实现交互功能,推动了全3D打印交互设备的创新发展。
English: This study introduces a computational design pipeline that embeds capacitive touchpoints into 3D models, enabling interactive objects through optimized conductive paths and single-wire sensing with high accuracy, advancing fully 3D printed interactive devices.
Authors:Daniel Platnick, Mohamed E. Bengueddache, Marjan Alirezaie, Dava J. Newman, Alex ''Sandy'' Pentland, Hossein Rahnama
Abstract:
Generative agents powered by language models are increasingly deployed for long-horizon tasks. However, as long-term memory context grows over time, they struggle to maintain coherence. This deficiency leads to critical failures, including identity drift, ignoring established beliefs, and the propagation of hallucinations in multi-agent systems. To mitigate these challenges, this paper introduces Identity Retrieval-Augmented Generation (ID-RAG), a novel mechanism designed to ground an agent's persona and persistent preferences in a dynamic, structured identity model: a knowledge graph of core beliefs, traits, and values. During the agent's decision loop, this model is queried to retrieve relevant identity context, which directly informs action selection. We demonstrate this approach by introducing and implementing a new class of ID-RAG enabled agents called Human-AI Agents (HAis), where the identity model is inspired by the Chronicle structure used in Perspective-Aware AI, a dynamic knowledge graph learned from a real-world entity's digital footprint. In social simulations of a mayoral election, HAis using ID-RAG outperformed baseline agents in long-horizon persona coherence - achieving higher identity recall across all tested models by the fourth timestep - and reduced simulation convergence time by 19% (GPT-4o) and 58% (GPT-4o mini). By treating identity as an explicit, retrievable knowledge structure, ID-RAG offers a foundational approach for developing more temporally coherent, interpretable, and aligned generative agents. Our code is open-source and available at: https://github.com/flybits/humanai-agents.
中文摘要:本文提出身份检索增强生成(ID-RAG)机制,通过动态知识图谱保持生成式智能体在长期任务中的人格一致性,显著提升身份记忆能力并缩短仿真时间。
English Summary: This paper introduces Identity Retrieval-Augmented Generation (ID-RAG), a novel mechanism that uses a dynamic knowledge graph to maintain generative agents' persona coherence during long-term tasks, significantly improving identity recall and reducing simulation time.
Authors:Hannah Kim, Kushan Mitra, Chen Shen, Dan Zhang, Estevam Hruschka
Abstract:
Large language models (LLMs) are being increasingly used for planning in orchestrated multi-agent systems. However, existing LLM-based approaches often fall short of human expectations and, critically, lack effective mechanisms for users to inspect, understand, and control their behaviors. These limitations call for enhanced transparency, controllability, and human oversight. To address this, we introduce AIPOM, a system supporting human-in-the-loop planning through conversational and graph-based interfaces. AIPOM enables users to transparently inspect, refine, and collaboratively guide LLM-generated plans, significantly enhancing user control and trust in multi-agent workflows. Our code and demo video are available at https://github.com/megagonlabs/aipom.
中文: AIPOM系统通过对话和图界面增强多智能体规划中的透明度与用户控制,有效解决了当前LLM方法在可检查性和人工监督方面的不足。
English: The AIPOM system introduces conversational and graph-based interfaces to enhance transparency and user control in LLM-driven multi-agent planning, addressing current limitations in inspectability and human oversight.
Authors:Suli Wang, Yang-yang Li, Siqi Cai, Haizhou Li
Abstract:
Decoding speech from stereo-electroencephalography (sEEG) signals has emerged as a promising direction for brain-computer interfaces (BCIs). Its clinical applicability, however, is limited by the inherent non-stationarity of neural signals, which causes domain shifts between training and testing, undermining decoding reliability. To address this challenge, a two-stage framework is proposed for enhanced robustness. First, a multi-scale decomposable mixing (MDM) module is introduced to model the hierarchical temporal dynamics of speech production, learning stable multi-timescale representations from sEEG signals. Second, a source-free online test-time adaptation (TTA) method performs entropy minimization to adapt the model to distribution shifts during inference. Evaluations on the public DU-IN spoken word decoding benchmark show that the approach outperforms state-of-the-art models, particularly in challenging cases. This study demonstrates that combining invariant feature learning with online adaptation is a principled strategy for developing reliable BCI systems. Our code is available at https://github.com/lyyi599/MDM-TENT.
中文: 本研究提出一个结合多尺度特征学习和在线适应的两阶段框架,旨在提高从立体脑电图信号解码语音的鲁棒性,并在基准测试中展现出优越性能。
English: This study introduces a two-stage framework combining multi-scale feature learning with online adaptation to enhance the robustness of speech decoding from sEEG signals, demonstrating superior performance on benchmark tests.
Authors:Daniel Pahr, Sara Di Bartolomeo
Abstract:
The NASA task load index (short: NASA-TLX) is a common metric to evaluate the workload of a user in a visualization study. Yet, it is rarely performed as initially intended, as the sources-of-workload evaluation is often omitted for various reasons. We conduct an online survey to investigate the task load of administering different versions of the NASA-TLX in a meta-study using the ReVISit framework. Our results show that it is not the slight increase in experiment time, but rather participants' frustration with the procedure, that contributes to the slight increase in task load when using the full version of the TLX compared to using a shortened version. However, we also show that the full version can shine a different and more faceted light on workload by adding a personal dimension to the data. We propose that a compact version of the sources-of-workload questionnaire can mitigate both time loss and frustration for study participants, while still providing the same data as the original procedure. The online study can be found and interactively explored on https://dpahr.github.io/tlxtlx/, and the source for the study, as well as the code for our analysis, can be found on https://github.com/dpahr/tlxtlx/.
中文: 研究表明,完整版NASA-TLX虽因参与者挫败感略微增加任务负荷,但能提供更全面的工作负荷视角,建议采用精简版在保持数据质量的同时缓解这些问题。
English: The study reveals that while the full NASA-TLX version slightly increases task load due to participant frustration rather than time, it offers richer workload insights, and a compact version is proposed to reduce these issues while maintaining data quality.
Authors:Hao Yang, Weijie Qiu, Ru Zhang, Zhou Fang, Ruichao Mao, Xiaoyu Lin, Maji Huang, Zhaosong Huang, Teng Guo, Shuoyang Liu, Hai Rao
Abstract:
Although Multimodal Large Language Models (MLLMs) have been widely applied across domains, they are still facing challenges in domain-specific tasks, such as User Interface (UI) understanding accuracy and UI generation quality. In this paper, we introduce UI-UG (a unified MLLM for UI Understanding and Generation), integrating both capabilities. For understanding tasks, we employ Supervised Fine-tuning (SFT) combined with Group Relative Policy Optimization (GRPO) to enhance fine-grained understanding on the modern complex UI data. For generation tasks, we further use Direct Preference Optimization (DPO) to make our model generate human-preferred UIs. In addition, we propose an industrially effective workflow, including the design of an LLM-friendly domain-specific language (DSL), training strategies, rendering processes, and evaluation metrics. In experiments, our model achieves state-of-the-art (SOTA) performance on understanding tasks, outperforming both larger general-purpose MLLMs and similarly-sized UI-specialized models. Our model is also on par with these larger MLLMs in UI generation performance at a fraction of the computational cost. We also demonstrate that integrating understanding and generation tasks can improve accuracy and quality for both tasks. Code and Model: https://github.com/neovateai/UI-UG
中文: 本文提出UI-UG这一统一多模态大语言模型,整合了用户界面理解与生成能力,在理解任务上达到最优性能,并以更低计算成本实现了与更大模型相当的界面生成质量。
English: This paper introduces UI-UG, a unified Multimodal Large Language Model that integrates UI understanding and generation, achieving state-of-the-art performance in understanding tasks and comparable generation quality to larger models with significantly lower computational cost.
Authors:Changde Du, Yizhuo Lu, Zhongyu Huang, Yi Sun, Zisen Zhou, Shaozheng Qin, Huiguang He
Abstract:
The ability to represent emotion plays a significant role in human cognition and social interaction, yet the high-dimensional geometry of this affective space and its neural underpinnings remain debated. A key challenge, the `behavior-neural gap,' is the limited ability of human self-reports to predict brain activity. Here we test the hypothesis that this gap arises from the constraints of traditional rating scales and that large-scale similarity judgments can more faithfully capture the brain's affective geometry. Using AI models as `cognitive agents,' we collected millions of triplet odd-one-out judgments from a multimodal large language model (MLLM) and a language-only model (LLM) in response to 2,180 emotionally evocative videos. We found that the emergent 30-dimensional embeddings from these models are highly interpretable and organize emotion primarily along categorical lines, yet in a blended fashion that incorporates dimensional properties. Most remarkably, the MLLM's representation predicted neural activity in human emotion-processing networks with the highest accuracy, outperforming not only the LLM but also, counterintuitively, representations derived directly from human behavioral ratings. This result supports our primary hypothesis and suggests that sensory grounding--learning from rich visual data--is critical for developing a truly neurally-aligned conceptual framework for emotion. Our findings provide compelling evidence that MLLMs can autonomously develop rich, neurally-aligned affective representations, offering a powerful paradigm to bridge the gap between subjective experience and its neural substrates. Project page: https://reedonepeck.github.io/ai-emotion.github.io/.
Authors:Sasan Sharifipour, Constantino Ãlvarez Casado, Le Nguyen, Tharindu Ekanayake, Manuel Lage Cañellas, Nhi Nguyen, Miguel Bordallo López
Abstract:
Human Activity Recognition supports applications in healthcare, manufacturing, and human-machine interaction. LiDAR point clouds offer a privacy-preserving alternative to cameras and are robust to illumination. We propose a HAR method based on graph spectral analysis. Each LiDAR frame is mapped to a proximity graph (epsilon-graph) and the Laplacian spectrum is computed. Eigenvalues and statistics of eigenvectors form pose descriptors, and temporal statistics over sliding windows yield fixed vectors for classification with support vector machines and random forests. On the MM-Fi dataset with 40 subjects and 27 activities, under a strict subject-independent protocol, the method reaches 94.4% accuracy on a 13-class rehabilitation set and 90.3% on all 27 activities. It also surpasses the skeleton-based baselines reported for MM-Fi. The contribution is a compact and interpretable feature set derived directly from point cloud geometry that provides an accurate and efficient alternative to end-to-end deep learning.
中文: 本研究提出一种基于LiDAR点云的人类活动识别方法,通过构建邻近图并分析其拉普拉斯谱来生成姿态描述符,在MM-Fi数据集上准确率超过90%,同时提供了保护隐私且可解释的深度学习替代方案。
English: This study introduces a human activity recognition method using LiDAR point clouds, which constructs proximity graphs and analyzes their Laplacian spectra to create pose descriptors, achieving over 90% accuracy on the MM-Fi dataset while offering a privacy-preserving and interpretable alternative to deep learning approaches.
Authors:Ke Wang, Houxing Ren, Zimu Lu, Mingjie Zhan, Hongsheng Li
Abstract:
The growing capabilities of large language models and multimodal systems have spurred interest in voice-first AI assistants, yet existing benchmarks are inadequate for evaluating the full range of these systems' capabilities. We introduce VoiceAssistant-Eval, a comprehensive benchmark designed to assess AI assistants across listening, speaking, and viewing. VoiceAssistant-Eval comprises 10,497 curated examples spanning 13 task categories. These tasks include natural sounds, music, and spoken dialogue for listening; multi-turn dialogue, role-play imitation, and various scenarios for speaking; and highly heterogeneous images for viewing. To demonstrate its utility, we evaluate 21 open-source models and GPT-4o-Audio, measuring the quality of the response content and speech, as well as their consistency. The results reveal three key findings: (1) proprietary models do not universally outperform open-source models; (2) most models excel at speaking tasks but lag in audio understanding; and (3) well-designed smaller models can rival much larger ones. Notably, the mid-sized Step-Audio-2-mini (7B) achieves more than double the listening accuracy of LLaMA-Omni2-32B-Bilingual. However, challenges remain: multimodal (audio plus visual) input and role-play voice imitation tasks are difficult for current models, and significant gaps persist in robustness and safety alignment. VoiceAssistant-Eval identifies these gaps and establishes a rigorous framework for evaluating and guiding the development of next-generation AI assistants. Code and data will be released at https://mathllm.github.io/VoiceAssistantEval/ .
Authors:Hyeonseong Kim, Roy El-Helou, Seungbeen Lee, Sungjoon Choi, Matthew Pan
Abstract:
Playful deception, a common feature in human social interactions, remains underexplored in Human-Robot Interaction (HRI). Inspired by the Turkish Ice Cream (TIC) vendor routine, we investigate how bounded, culturally familiar forms of deception influence user trust, enjoyment, and engagement during robotic handovers. We design a robotic manipulator equipped with a custom end-effector and implement five TIC-inspired trick policies that deceptively delay the handover of an ice cream-shaped object. Through a mixed-design user study with 91 participants, we evaluate the effects of playful deception and interaction duration on user experience. Results reveal that TIC-inspired deception significantly enhances enjoyment and engagement, though reduces perceived safety and trust, suggesting a structured trade-off across the multi-dimensional aspects. Our findings demonstrate that playful deception can be a valuable design strategy for interactive robots in entertainment and engagement-focused contexts, while underscoring the importance of deliberate consideration of its complex trade-offs. You can find more information, including demonstration videos, on https://hyeonseong-kim98.github.io/turkish-ice-cream-robot/ .
Authors:Nikita Torgashov, Gustav Eje Henter, Gabriel Skantze
Abstract:
We present VoXtream, a fully autoregressive, zero-shot streaming text-to-speech (TTS) system for real-time use that begins speaking from the first word. VoXtream directly maps incoming phonemes to audio tokens using a monotonic alignment scheme and a dynamic look-ahead that does not delay onset. Built around an incremental phoneme transformer, a temporal transformer predicting semantic and duration tokens, and a depth transformer producing acoustic tokens, VoXtream achieves, to our knowledge, the lowest initial delay among publicly available streaming TTS: 102 ms on GPU. Despite being trained on a mid-scale 9k-hour corpus, it matches or surpasses larger baselines on several metrics, while delivering competitive quality in both output- and full-streaming settings. Demo and code are available at https://herimor.github.io/voxtream.
Authors:Taesoo Kim, Yongsik Jo, Hyunmin Song, Taehwan Kim
Abstract:
Human conversation involves language, speech, and visual cues, with each medium providing complementary information. For instance, speech conveys a vibe or tone not fully captured by text alone. While multimodal LLMs focus on generating text responses from diverse inputs, less attention has been paid to generating natural and engaging speech. We propose a human-like agent that generates speech responses based on conversation mood and responsive style information. To achieve this, we build a novel MultiSensory Conversation dataset focused on speech to enable agents to generate natural speech. We then propose a multimodal LLM-based model for generating text responses and voice descriptions, which are used to generate speech covering paralinguistic information. Experimental results demonstrate the effectiveness of utilizing both visual and audio modalities in conversation to generate engaging speech. The source code is available in https://github.com/kimtaesu24/MSenC
Chinese Summary: 本研究提出了一种拟人化对话代理,通过整合视觉和音频线索,利用基于新型多模态大语言模型的系统,在专门构建的多感官对话数据集上训练,实现了自然且富有吸引力的语音生成。
English Summary: This research introduces a human-like conversational agent that generates natural and engaging speech by integrating visual and audio cues, using a novel multimodal LLM-based model trained on a specialized MultiSensory Conversation dataset.
Authors:Zongru Wu, Rui Mao, Zhiyuan Tian, Pengzhou Cheng, Tianjie Ju, Zheng Wu, Lingzhong Dong, Haiyue Sheng, Zhuosheng Zhang, Gongshen Liu
Abstract:
The advent of multimodal agents facilitates effective interaction within graphical user interface (GUI), especially in ubiquitous GUI control. However, their inability to reliably execute toggle control instructions remains a key bottleneck. To investigate this, we construct a state control benchmark with binary toggle instructions from public datasets. Evaluations of existing agents demonstrate their unreliability, particularly when the current toggle state already matches the desired state. To address the challenge, we propose State-aware Reasoning (StaR), a training method that teaches agents to perceive the current toggle state, analyze the desired state from the instruction, and act accordingly. Experiments on three multimodal agents demonstrate that StaR can improve toggle instruction execution accuracy by over 30\%. Further evaluations on three public benchmarks show that StaR also enhances general task performance. Finally, evaluations on a dynamic environment highlight the potential of StaR for real-world applications. Code, benchmark, and StaR-enhanced agents are available at https://github.com/ZrW00/StaR.
中文摘要:本文提出状态感知推理(StaR)训练方法,通过教导多模态智能体感知当前切换状态并解析指令中的目标状态,将切换指令执行准确率提升超过30%,同时在多个基准测试中有效提升通用任务性能。
English Summary: This paper introduces State-aware Reasoning (StaR), a training method that significantly improves multimodal agents' accuracy in executing toggle instructions by over 30% through teaching them to perceive current states and analyze desired actions, while also enhancing general task performance across benchmarks.
Authors:Christian Fane
Abstract:
Diminished reality (DR) refers to the digital removal of real-world objects by compositing background content in their place. This thesis presents a real-time, inpainting-based DR system designed to enable privacy control in shared-space mixed reality (MR) meetings. The system allows a primary headset user to selectively remove personal or sensitive items from their environment, ensuring that those objects are no longer visible to other participants. Removal is achieved through semantic segmentation and precise object selection, followed by real-time inpainting from the viewpoint of a secondary observer, implemented using a mobile ZED 2i depth camera. The solution is designed to be portable and robust, requiring neither a fixed secondary viewpoint nor prior 3D scanning of the environment. The system utilises YOLOv11 for object detection and a modified Decoupled Spatial-Temporal Transformer (DSTT) model for high-quality video inpainting. At 720p resolution, the pipeline sustains frame rates exceeding 20 fps, demonstrating the feasibility of real-time diminished reality for practical privacy-preserving MR applications.
本论文提出了一种实时减实系统,通过语义分割和视频修复技术选择性地移除混合现实会议中的敏感物体,在720p分辨率下帧率超过20 fps,实现了隐私保护功能。
This thesis introduces a real-time diminished reality system that uses semantic segmentation and video inpainting to selectively remove sensitive objects from mixed reality meetings, achieving over 20 fps at 720p for privacy protection.
Authors:Junjie Chen, Yao Hu, Junjie Li, Kangyue Li, Kun Liu, Wenpeng Li, Xu Li, Ziyuan Li, Feiyu Shen, Xu Tang, Manzhen Wei, Yichen Wu, Fenglong Xie, Kaituo Xu, Kun Xie
Abstract:
Full-duplex voice interaction allows users and agents to speak simultaneously with controllable barge-in, enabling lifelike assistants and customer service. Existing solutions are either end-to-end, difficult to design and hard to control, or modular pipelines governed by turn-taking controllers that ease upgrades and per-module optimization; however, prior modular frameworks depend on non-open components and external providers, limiting holistic optimization. In this work, we present a complete, practical full-duplex voice interaction system comprising a turn-taking controller, an interaction module, and a dialogue manager. The controller integrates streaming personalized VAD (pVAD) to suppress false barge-ins from noise and non-primary speakers, precisely timestamp primary-speaker segments, and explicitly enable primary-speaker barge-ins; a semantic end-of-turn detector improves stop decisions. It upgrades heterogeneous half-duplex pipelines, cascaded, semi-cascaded, and speech-to-speech, to full duplex. Using internal models, we implement cascaded and semi-cascaded variants; the semi-cascaded one captures emotional and paralinguistic cues, yields more coherent responses, lowers latency and error propagation, and improves robustness. A dialogue manager extends capabilities via tool invocation and context management. We also propose three system-level metrics, barge-in, end-of-turn detection accuracy, and end-to-end latency, to assess naturalness, control accuracy, and efficiency. Experiments show fewer false interruptions, more accurate semantic ends, and lower latency approaching industrial systems, enabling robust, natural, real-time full-duplex interaction. Demos: https://fireredteam.github.io/demos/firered_chat.
中文: 本研究提出了一种实用的全双工语音交互系统,通过集成具有个性化语音活动检测和语义对话结束检测的对话轮换控制器,将多种半双工流程升级为全双工,有效减少误中断、降低延迟,并提升交互的自然度与鲁棒性。
English: This work presents a practical full-duplex voice interaction system that upgrades various half-duplex pipelines to full duplex, integrating a turn-taking controller with personalized VAD and semantic end-of-turn detection to reduce false interruptions, lower latency, and enhance interaction naturalness and robustness.
Authors:Péter Ferenc Gyarmati, Dominik Moritz, Torsten Möller, Laura Koesten
Abstract:
To address the brittleness of monolithic AI agents, our prototype for automated visual data reporting explores a Human-AI Partnership model. Its hybrid, multi-agent architecture strategically externalizes logic from LLMs to deterministic modules, leveraging the rule-based system Draco for principled visualization design. The system delivers a dual-output: an interactive Observable report with Mosaic for reader exploration, and executable Marimo notebooks for deep, analyst-facing traceability. This granular architecture yields a fully automatic yet auditable and steerable system, charting a path toward a more synergistic partnership between human experts and AI. For reproducibility, our implementation and examples are available at https://peter-gy.github.io/VISxGenAI-2025/.
Authors:Hongyi Jing, Jiafu Chen, Chen Rao, Ziqiang Dang, Jiajie Teng, Tianyi Chu, Juncheng Mo, Shuo Fang, Huaizhong Lin, Rui Lv, Chenguang Ma, Lei Zhao
Abstract:
The existing Multimodal Large Language Models (MLLMs) for GUI perception have made great progress. However, the following challenges still exist in prior methods: 1) They model discrete coordinates based on text autoregressive mechanism, which results in lower grounding accuracy and slower inference speed. 2) They can only locate predefined sets of elements and are not capable of parsing the entire interface, which hampers the broad application and support for downstream tasks. To address the above issues, we propose SparkUI-Parser, a novel end-to-end framework where higher localization precision and fine-grained parsing capability of the entire interface are simultaneously achieved. Specifically, instead of using probability-based discrete modeling, we perform continuous modeling of coordinates based on a pre-trained Multimodal Large Language Model (MLLM) with an additional token router and coordinate decoder. This effectively mitigates the limitations inherent in the discrete output characteristics and the token-by-token generation process of MLLMs, consequently boosting both the accuracy and the inference speed. To further enhance robustness, a rejection mechanism based on a modified Hungarian matching algorithm is introduced, which empowers the model to identify and reject non-existent elements, thereby reducing false positives. Moreover, we present ScreenParse, a rigorously constructed benchmark to systematically assess structural perception capabilities of GUI models across diverse scenarios. Extensive experiments demonstrate that our approach consistently outperforms SOTA methods on ScreenSpot, ScreenSpot-v2, CAGUI-Grounding and ScreenParse benchmarks. The resources are available at https://github.com/antgroup/SparkUI-Parser.
中文摘要:现有GUI感知多模态大语言模型因离散坐标建模和有限元素检测存在精度与速度问题,SparkUI-Parser通过连续坐标建模和增强解析能力,在多个基准测试中实现更优性能。
English Summary: Existing multimodal large language models for GUI perception face challenges in accuracy and speed due to discrete coordinate modeling and limited element detection, which SparkUI-Parser addresses through continuous coordinate modeling and enhanced parsing capabilities to achieve superior performance across multiple benchmarks.
Authors:Kyra Wilson, Mattea Sim, Anna-Maria Gueorguieva, Aylin Caliskan
Abstract:
In this study, we conduct a resume-screening experiment (N=528) where people collaborate with simulated AI models exhibiting race-based preferences (bias) to evaluate candidates for 16 high and low status occupations. Simulated AI bias approximates factual and counterfactual estimates of racial bias in real-world AI systems. We investigate people's preferences for White, Black, Hispanic, and Asian candidates (represented through names and affinity groups on quality-controlled resumes) across 1,526 scenarios and measure their unconscious associations between race and status using implicit association tests (IATs), which predict discriminatory hiring decisions but have not been investigated in human-AI collaboration. When making decisions without AI or with AI that exhibits no race-based preferences, people select all candidates at equal rates. However, when interacting with AI favoring a particular group, people also favor those candidates up to 90% of the time, indicating a significant behavioral shift. The likelihood of selecting candidates whose identities do not align with common race-status stereotypes can increase by 13% if people complete an IAT before conducting resume screening. Finally, even if people think AI recommendations are low quality or not important, their decisions are still vulnerable to AI bias under certain circumstances. This work has implications for people's autonomy in AI-HITL scenarios, AI and work, design and evaluation of AI hiring systems, and strategies for mitigating bias in collaborative decision-making tasks. In particular, organizational and regulatory policy should acknowledge the complex nature of AI-HITL decision making when implementing these systems, educating people who use them, and determining which are subject to oversight.
中文摘要:本研究表明,人类的招聘决策会受到人工智能种族偏见的显著影响,最高可达90%的模仿率,但通过内隐联想测试提升认知后,这种影响可降低13%。
English Summary: This study reveals that people's hiring decisions are significantly influenced by AI's racial biases, often mirroring them up to 90% of the time, though awareness through implicit association tests can reduce this effect by 13%.
Authors:Sophia Bianchi Moyen, Rickmer Krohn, Sophie Lueth, Kay Pompetzki, Jan Peters, Vignesh Prasad, Georgia Chalvatzaki
Abstract:
Intuitive Teleoperation interfaces are essential for mobile manipulation robots to ensure high quality data collection while reducing operator workload. A strong sense of embodiment combined with minimal physical and cognitive demands not only enhances the user experience during large-scale data collection, but also helps maintain data quality over extended periods. This becomes especially crucial for challenging long-horizon mobile manipulation tasks that require whole-body coordination. We compare two distinct robot control paradigms: a coupled embodiment integrating arm manipulation and base navigation functions, and a decoupled embodiment treating these systems as separate control entities. Additionally, we evaluate two visual feedback mechanisms: immersive virtual reality and conventional screen-based visualization of the robot's field of view. These configurations were systematically assessed across a complex, multi-stage task sequence requiring integrated planning and execution. Our results show that the use of VR as a feedback modality increases task completion time, cognitive workload, and perceived effort of the teleoperator. Coupling manipulation and navigation leads to a comparable workload on the user as decoupling the embodiments, while preliminary experiments suggest that data acquired by coupled teleoperation leads to better imitation learning performance. Our holistic view on intuitive teleoperation interfaces provides valuable insight into collecting high-quality, high-dimensional mobile manipulation data at scale with the human operator in mind. Project website:https://sophiamoyen.github.io/role-embodiment-wbc-moma-teleop/
中文: 直观的遥操作界面通过耦合机械臂操控与底盘导航功能可提升移动操作任务的数据质量,其中虚拟现实反馈会增加操作员负担,而耦合控制模式在模仿学习性能方面展现出优势。
English: Intuitive teleoperation interfaces that couple manipulation and navigation functions can enhance data quality for mobile manipulation tasks, with VR feedback increasing operator workload while coupled control shows promise for improving imitation learning performance.
Authors:Jingru Fan, Yufan Dang, Jingyao Wu, Huatao Li, Runde Yang, Xiyuan Yang, Yuheng Wang, Zhong Zhang, Yaxi Lu, Yankai Lin, Zhiyuan Liu, Dahai Li, Chen Qian
Abstract:
With the raid evolution of large language models and multimodal foundation models, the mobile-agent landscape has proliferated without converging on the fundamental challenges. This paper identifies four core problems that must be solved for mobile agents to deliver practical, scalable impact: (1) generalization across tasks, modalities, apps, and devices; (2) accuracy, specifically precise on-screen interaction and click targeting; (3) long-horizon capability for sustained, multi-step goals; and (4) efficiency, specifically high-performance runtime on resource-constrained devices. We present AppCopilot, a multimodal, multi-agent, general-purpose on-device assistant that operates across applications and constitutes a full-stack, closed-loop system from data to deployment. AppCopilot operationalizes this position through an end-to-end autonomous pipeline spanning data collection, training, deployment, high-quality and efficient inference, and mobile application development. At the model layer, it integrates multimodal foundation models with robust Chinese-English support. At the reasoning and control layer, it combines chain-of-thought reasoning, hierarchical task planning and decomposition, and multi-agent collaboration. At the execution layer, it enables user personalization and experiential adaptation, voice interaction, function calling, cross-app and cross-device orchestration, and comprehensive mobile app support. The system design incorporates profiling-driven optimization for latency, memory, and energy across heterogeneous hardware. Empirically, AppCopilot achieves significant improvements along all four dimensions: stronger generalization, higher-precision on-screen actions, more reliable long-horizon task completion, and faster, more resource-efficient runtime.
中文摘要:本文提出AppCopilot,一种设备端多模态助手,通过融合基础模型、多智能体协作和移动端优化部署,系统性地解决了移动智能体在泛化能力、操作精度、长程任务和运行效率四大核心难题。
English Summary: This paper introduces AppCopilot, an on-device multimodal assistant designed to address four core challenges in mobile agents—generalization, accuracy, long-horizon capability, and efficiency—through an integrated system combining foundation models, multi-agent collaboration, and optimized mobile deployment.
Authors:Ranjie Duan, Jiexi Liu, Xiaojun Jia, Shiji Zhao, Ruoxi Cheng, Fengxiang Wang, Cheng Wei, Yong Xie, Chang Liu, Defeng Li, Yinpeng Dong, Yichi Zhang, Yuefeng Chen, Chongwen Wang, Xingjun Ma, Xingxing Wei, Yang Liu, Hang Su, Jun Zhu, Xinfeng Li, Yitong Sun, Jie Zhang, Jinzhao Hu, Sha Xu, Wenchao Yang, Yitong Yang, Xingyao Zhang, Yingshui Tan, Jialing Tao, Hui Xue
Abstract:
Large language models (LLMs) typically deploy safety mechanisms to prevent harmful content generation. Most current approaches focus narrowly on risks posed by malicious actors, often framing risks as adversarial events and relying on defensive refusals. However, in real-world settings, risks also come from non-malicious users seeking help while under psychological distress (e.g., self-harm intentions). In such cases, the model's response can strongly influence the user's next actions. Simple refusals may lead them to repeat, escalate, or move to unsafe platforms, creating worse outcomes. We introduce Constructive Safety Alignment (CSA), a human-centric paradigm that protects against malicious misuse while actively guiding vulnerable users toward safe and helpful results. Implemented in Oyster-I (Oy1), CSA combines game-theoretic anticipation of user reactions, fine-grained risk boundary discovery, and interpretable reasoning control, turning safety into a trust-building process. Oy1 achieves state-of-the-art safety among open models while retaining high general capabilities. On our Constructive Benchmark, it shows strong constructive engagement, close to GPT-5, and unmatched robustness on the Strata-Sword jailbreak dataset, nearing GPT-o1 levels. By shifting from refusal-first to guidance-first safety, CSA redefines the model-user relationship, aiming for systems that are not just safe, but meaningfully helpful. We release Oy1, code, and the benchmark to support responsible, user-centered AI.
中文: 现有大语言模型的安全机制常因防御性拒绝而无法帮助心理脆弱的用户,因此CSA提出以人为中心的安全对齐方法,通过预期推理和信任建立引导高危用户获得安全结果,在开源模型中实现了顶尖的安全性和通用能力。
English: Current LLM safety mechanisms often fail vulnerable users by using defensive refusals, so CSA introduces a human-centric approach that guides at-risk users toward safe outcomes through anticipatory reasoning and trust-building, achieving top safety and capability levels in open models.
Authors:Runduo Han, Yanxin Hu, Yihui Fu, Zihan Zhang, Yukai Jv, Li Chen, Lei Xie
Abstract:
Separating overlapping speech from multiple speakers is crucial for effective human-vehicle interaction. This paper proposes CabinSep, a lightweight neural mask-based minimum variance distortionless response (MVDR) speech separation approach, to reduce speech recognition errors in back-end automatic speech recognition (ASR) models. Our contributions are threefold: First, we utilize channel information to extract spatial features, which improves the estimation of speech and noise masks. Second, we employ MVDR during inference, reducing speech distortion to make it more ASR-friendly. Third, we introduce a data augmentation method combining simulated and real-recorded impulse responses (IRs), improving speaker localization at zone boundaries and further reducing speech recognition errors. With a computational complexity of only 0.4 GMACs, CabinSep achieves a 17.5% relative reduction in speech recognition error rate in a real-recorded dataset compared to the state-of-the-art DualSep model. Demos are available at: https://cabinsep.github.io/cabinsep/.
Authors:Gursimran Singh, Aviral Chharia, Rahul Upadhyay, Vinay Kumar, Luca Longo
Abstract:
Electroencephalography (EEG)-based Brain-Computer Interfaces (BCIs) have emerged as a transformative technology with applications spanning robotics, virtual reality, medicine, and rehabilitation. However, existing BCI frameworks face several limitations, including a lack of stage-wise flexibility essential for experimental research, steep learning curves for researchers without programming expertise, elevated costs due to reliance on proprietary software, and a lack of all-inclusive features leading to the use of multiple external tools affecting research outcomes. To address these challenges, we present PyNoetic, a modular BCI framework designed to cater to the diverse needs of BCI research. PyNoetic is one of the very few frameworks in Python that encompasses the entire BCI design pipeline, from stimulus presentation and data acquisition to channel selection, filtering, feature extraction, artifact removal, and finally simulation and visualization. Notably, PyNoetic introduces an intuitive and end-to-end GUI coupled with a unique pick-and-place configurable flowchart for no-code BCI design, making it accessible to researchers with minimal programming experience. For advanced users, it facilitates the seamless integration of custom functionalities and novel algorithms with minimal coding, ensuring adaptability at each design stage. PyNoetic also includes a rich array of analytical tools such as machine learning models, brain-connectivity indices, systematic testing functionalities via simulation, and evaluation methods of novel paradigms. PyNoetic's strengths lie in its versatility for both offline and real-time BCI development, which streamlines the design process, allowing researchers to focus on more intricate aspects of BCI development and thus accelerate their research endeavors. Project Website: https://neurodiag.github.io/PyNoetic
Authors:Filip J. Kucia, Bartosz Grabek, Szymon D. Trochimiak, Anna Wróblewska
Abstract:
Conversational agents powered by Large Language Models (LLMs) are increasingly utilized in educational settings, in particular in individual closed digital environments, yet their potential adoption in the physical learning environments like cultural heritage sites, museums, and art galleries remains relatively unexplored. In this study, we present Artistic Chatbot, a voice-to-voice RAG-powered chat system to support informal learning and enhance visitor engagement during a live art exhibition celebrating the 15th anniversary of the Faculty of Media Art at the Warsaw Academy of Fine Arts, Poland. The question answering (QA) chatbot responded to free-form spoken questions in Polish using the context retrieved from a curated, domain-specific knowledge base consisting of 226 documents provided by the organizers, including faculty information, art magazines, books, and journals. We describe the key aspects of the system architecture and user interaction design, as well as discuss the practical challenges associated with deploying chatbots at public cultural sites. Our findings, based on interaction analysis, demonstrate that chatbots such as Artistic Chatbot effectively maintain responses grounded in exhibition content (60\% of responses directly relevant), even when faced with unpredictable queries outside the target domain, showing their potential for increasing interactivity in public cultural sites. GitHub project page: https://github.com/cinekucia/artistic-chatbot-cikm2025
中文: 本研究开发的Artistic Chatbot语音问答系统通过检索增强生成技术,在艺术展览中为访客提供基于专业知识的回答,有效提升了文化场所的互动体验,展现了在实体学习环境中应用的可行性。
English: This study introduces Artistic Chatbot, a voice-based RAG system that effectively supports informal learning at cultural sites by providing contextually grounded responses, demonstrating its potential to enhance visitor engagement in physical settings like art exhibitions.
Authors:Saksorn Ruangtanusak, Pittawat Taveekitworachai, Kunat Pipatanakul
Abstract:
This report investigates approaches for prompting a tool-augmented large language model (LLM) to act as a role-playing dialogue agent in the API track of the Commonsense Persona-grounded Dialogue Challenge (CPDC) 2025. In this setting, dialogue agents often produce overly long in-character responses (over-speaking) while failing to use tools effectively according to the persona (under-acting), such as generating function calls that do not exist or making unnecessary tool calls before answering. We explore four prompting approaches to address these issues: 1) basic role prompting, 2) human-crafted role prompting, 3) automatic prompt optimization (APO), and 4) rule-based role prompting. The rule-based role prompting (RRP) approach achieved the best performance through two novel techniques--character-card/scene-contract design and strict enforcement of function calling--which led to an overall score of 0.571, improving on the zero-shot baseline score of 0.519. These findings demonstrate that RRP design can substantially improve the effectiveness and reliability of role-playing dialogue agents compared with more elaborate methods such as APO. To support future efforts in developing persona prompts, we are open-sourcing all of our best-performing prompts and the APO tool. Source code is available at https://github.com/scb-10x/apo.
中文: 本研究探索了四种提示方法,通过角色卡片设计和严格函数调用优化角色扮演对话代理的过度发言和行动不足问题,其中基于规则的提示方法表现最佳。
English: This study explores four prompting methods to enhance role-playing dialogue agents by addressing over-speaking and under-acting issues, with rule-based role prompting achieving the best performance through character-card design and strict function enforcement.
Authors:Paritosh Parmar, Eric Peh, Basura Fernando
Abstract:
Existing Causal-Why Video Question Answering (VideoQA) models often struggle with higher-order reasoning, relying on opaque, monolithic pipelines that entangle video understanding, causal inference, and answer generation. These black-box approaches offer limited interpretability and tend to depend on shallow heuristics. We propose a novel, modular framework that explicitly decouples causal reasoning from answer generation, introducing natural language causal chains as interpretable intermediate representations. Inspired by human cognitive models, these structured cause-effect sequences bridge low-level video content with high-level causal reasoning, enabling transparent and logically coherent inference. Our two-stage architecture comprises a Causal Chain Extractor (CCE) that generates causal chains from video-question pairs, and a Causal Chain-Driven Answerer (CCDA) that produces answers grounded in these chains. To address the lack of annotated reasoning traces, we introduce a scalable method for generating high-quality causal chains from existing datasets using large language models. We also propose CauCo, a new evaluation metric for causality-oriented captioning. Experiments on three large-scale benchmarks demonstrate that our approach not only outperforms state-of-the-art models, but also yields substantial gains in explainability, user trust, and generalization -- positioning the CCE as a reusable causal reasoning engine across diverse domains. Project page: https://paritoshparmar.github.io/chainreaction/
Authors:Guoping Xu, Jayaram K. Udupa, Jax Luo, Songlin Zhao, Yajun Yu, Scott B. Raymond, Hao Peng, Lipeng Ning, Yogesh Rathi, Wei Liu, You Zhang
Abstract:
Medical image segmentation has advanced rapidly over the past two decades, largely driven by deep learning, which has enabled accurate and efficient delineation of cells, tissues, organs, and pathologies across diverse imaging modalities. This progress raises a fundamental question: to what extent have current models overcome persistent challenges, and what gaps remain? In this work, we provide an in-depth review of medical image segmentation, tracing its progress and key developments over the past decade. We examine core principles, including multiscale analysis, attention mechanisms, and the integration of prior knowledge, across the encoder, bottleneck, skip connections, and decoder components of segmentation networks. Our discussion is organized around seven key dimensions: (1) the shift from supervised to semi-/unsupervised learning, (2) the transition from organ segmentation to lesion-focused tasks, (3) advances in multi-modality integration and domain adaptation, (4) the role of foundation models and transfer learning, (5) the move from deterministic to probabilistic segmentation, (6) the progression from 2D to 3D and 4D segmentation, and (7) the trend from model invocation to segmentation agents. Together, these perspectives provide a holistic overview of the trajectory of deep learning-based medical image segmentation and aim to inspire future innovation. To support ongoing research, we maintain a continually updated repository of relevant literature and open-source resources at https://github.com/apple1986/medicalSegReview
中文摘要:本文全面回顾了过去十年医学图像分割的发展历程,从七个关键维度分析了技术演进,并指出了当前挑战与未来研究方向。
English Summary: This review comprehensively examines the evolution of medical image segmentation over the past decade, analyzing key technical developments across seven critical dimensions while identifying remaining challenges and future directions.
Authors:Debanjana Kar, Leopold Böss, Dacia Braca, Sebastian Maximilian Dennerlein, Nina Christine Hubig, Philipp Wintersberger, Yufang Hou
Abstract:
The rapid adoption of LLM-based conversational systems is already transforming the landscape of educational technology. However, the current state-of-the-art learning models do not take into account the student's affective states. Multiple studies in educational psychology support the claim that positive or negative emotional states can impact a student's learning capabilities. To bridge this gap, we present MathBuddy, an emotionally aware LLM-powered Math Tutor, which dynamically models the student's emotions and maps them to relevant pedagogical strategies, making the tutor-student conversation a more empathetic one. The student's emotions are captured from the conversational text as well as from their facial expressions. The student's emotions are aggregated from both modalities to confidently prompt our LLM Tutor for an emotionally-aware response. We have evaluated our model using automatic evaluation metrics across eight pedagogical dimensions and user studies. We report a massive 23 point performance gain using the win rate and a 3 point gain at an overall level using DAMR scores which strongly supports our hypothesis of improving LLM-based tutor's pedagogical abilities by modeling students' emotions. Our dataset and code are available at: https://github.com/ITU-NLP/MathBuddy .
中文: MathBuddy是一款情感感知的数学辅导系统,通过分析学生的文本对话和面部表情动态建模情绪状态,生成具有教学策略的共情回应,在评估中取得了显著性能提升。
English: MathBuddy is an emotionally aware LLM-powered math tutor that dynamically models students' emotions from text and facial expressions to deliver empathetic, pedagogically tailored responses, achieving significant performance gains in evaluations.
Authors:Tianjun Wei, Huizhong Guo, Yingpeng Du, Zhu Sun, Chen Huang, Dongxia Wang, Jie Zhang
Abstract:
User simulation is increasingly vital to develop and evaluate recommender systems (RSs). While Large Language Models (LLMs) offer promising avenues to simulate user behavior, they often struggle with the absence of specific domain alignment required for RSs and the efficiency demands of large-scale simulation. A vast yet underutilized resource for enhancing this alignment is the extensive user feedback inherent in RSs. However, directly leveraging such feedback presents two significant challenges. First, user feedback in RSs is often ambiguous and noisy, which negatively impacts effective preference alignment. Second, the massive volume of feedback largely hinders the efficiency of preference alignment, necessitating an efficient filtering mechanism to identify more informative samples. To overcome these hurdles, we introduce a novel data construction framework that leverages user feedback in RSs with advanced LLM capabilities to generate high-quality simulation data. Our framework unfolds in two key phases: (1) employing LLMs to generate cognitive decision-making processes on constructed simulation samples, reducing ambiguity in raw user feedback; (2) data distillation based on uncertainty estimation and behavior sampling to filter challenging yet denoised simulation samples. Accordingly, we fine-tune lightweight LLMs, as user simulators, using such high-quality dataset with corresponding decision-making processes. Extensive experiments verify that our framework significantly boosts the alignment with human preferences and in-domain reasoning capabilities of fine-tuned LLMs, and provides more insightful and interpretable signals when interacting with RSs. We believe our work will advance the RS community and offer valuable insights for broader human-centric AI research.
中文摘要:本文提出了一种创新框架,利用大语言模型和用户反馈生成高质量模拟数据,通过认知决策过程和数据蒸馏技术,显著提升了推荐系统与人类偏好的对齐能力及交互可解释性。
English Summary: This paper introduces a novel framework that leverages large language models and user feedback to generate high-quality simulation data, enhancing recommender systems' alignment with human preferences and interpretability through cognitive decision-making processes and data distillation.
Authors:Wei Xiong, Jiangtong Li, Jie Li, Kun Zhu
Abstract:
Electroencephalography (EEG) foundation models are poised to significantly advance brain signal analysis by learning robust representations from large-scale, unlabeled datasets. However, their rapid proliferation has outpaced the development of standardized evaluation benchmarks, which complicates direct model comparisons and hinders systematic scientific progress. This fragmentation fosters scientific inefficiency and obscures genuine architectural advancements. To address this critical gap, we introduce EEG-FM-Bench, the first comprehensive benchmark for the systematic and standardized evaluation of EEG foundation models (EEG-FMs). Our contributions are threefold: (1) we curate a diverse suite of downstream tasks and datasets from canonical EEG paradigms, implementing standardized processing and evaluation protocols within a unified open-source framework; (2) we benchmark prominent state-of-the-art foundation models to establish comprehensive baseline results for a clear comparison of the current landscape; (3) we perform qualitative analyses of the learned representations to provide insights into model behavior and inform future architectural design. Through extensive experiments, we find that fine-grained spatio-temporal feature interaction, multitask unified training and neuropsychological priors would contribute to enhancing model performance and generalization capabilities. By offering a unified platform for fair comparison and reproducible research, EEG-FM-Bench seeks to catalyze progress and guide the community toward the development of more robust and generalizable EEG-FMs. Code is released at https://github.com/xw1216/EEG-FM-Bench.
Chinese: EEG-FM-Bench作为首个全面的基准测试,旨在标准化脑电图基础模型的评估,通过统一任务、基准结果和定性分析来解决当前领域碎片化问题,以提升模型性能并指导未来发展。
English: EEG-FM-Bench is introduced as the first comprehensive benchmark to standardize the evaluation of EEG foundation models, addressing current fragmentation by providing unified tasks, baseline results, and insights to enhance model performance and guide future development.
Authors:Andrew C. Freeman, Luke Reinkensmeyer
Abstract:
Recent years have brought about a surge in neuromorphic ``event'' video research, primarily targeting computer vision applications. Event video eschews video frames in favor of asynchronous, per-pixel intensity samples. While much work has focused on a handful of representations for specific event cameras, these representations have shown limitations in flexibility, speed, and compressibility. We previously proposed the unified ADDER representation to address these concerns. This paper introduces numerous improvements to the adder-viz software for visualizing real-time event transcode processes and applications in-the-loop. The MIT-licensed software is available from a centralized repository at https://github.com/ac-freeman/adder-codec-rs.
中文: 本文介绍了对adder-viz软件的改进,用于实时可视化神经形态事件视频转码过程,通过统一的ADDER表示法解决了现有方法在灵活性和压缩性方面的局限。
English: This paper presents enhancements to the adder-viz software for real-time visualization of neuromorphic event video transcoding, addressing limitations in existing representations through the unified ADDER approach.
Authors:Running Zhao, Zhihan Jiang, Xinchen Zhang, Chirui Chang, Handi Chen, Weipeng Deng, Luyao Jin, Xiaojuan Qi, Xun Qian, Edith C. H. Ngai
Abstract:
Users often take notes for instructional videos to access key knowledge later without revisiting long videos. Automated note generation tools enable users to obtain informative notes efficiently. However, notes generated by existing research or off-the-shelf tools fail to preserve the information conveyed in the original videos comprehensively, nor can they satisfy users' expectations for diverse presentation formats and interactive features when using notes digitally. In this work, we present NoteIt, a system, which automatically converts instructional videos to interactable notes using a novel pipeline that faithfully extracts hierarchical structure and multimodal key information from videos. With NoteIt's interface, users can interact with the system to further customize the content and presentation formats of the notes according to their preferences. We conducted both a technical evaluation and a comparison user study (N=36). The solid performance in objective metrics and the positive user feedback demonstrated the effectiveness of the pipeline and the overall usability of NoteIt. Project website: https://zhaorunning.github.io/NoteIt/
Authors:Adrian Arnaiz-Rodriguez, Nina Corvelo Benz, Suhas Thejaswi, Nuria Oliver, Manuel Gomez-Rodriguez
Abstract:
Data-driven algorithmic matching systems promise to help human decision makers make better matching decisions in a wide variety of high-stakes application domains, such as healthcare and social service provision. However, existing systems are not designed to achieve human-AI complementarity: decisions made by a human using an algorithmic matching system are not necessarily better than those made by the human or by the algorithm alone. Our work aims to address this gap. To this end, we propose collaborative matching (comatch), a data-driven algorithmic matching system that takes a collaborative approach: rather than making all the matching decisions for a matching task like existing systems, it selects only the decisions that it is the most confident in, deferring the rest to the human decision maker. In the process, comatch optimizes how many decisions it makes and how many it defers to the human decision maker to provably maximize performance. We conduct a large-scale human subject study with $800$ participants to validate the proposed approach. The results demonstrate that the matching outcomes produced by comatch outperform those generated by either human participants or by algorithmic matching on their own. The data gathered in our human subject study and an implementation of our system are available as open source at https://github.com/Networks-Learning/human-AI-complementarity-matching.
中文摘要:提出的协同匹配系统comatch通过将不确定的匹配决策交由人类处理,实现了优于单独人类或算法决策的匹配效果,大规模实验已验证其有效性。
English Summary: The proposed collaborative matching system, comatch, enhances decision-making by selectively deferring uncertain matches to humans, achieving superior performance over standalone human or algorithmic approaches as validated through a large-scale study.
Authors:Xiaohan Wang, Zhimin Li, Joshua A. Levine, Matthew Berger
Abstract:
Recently, neural surrogate models have emerged as a compelling alternative to traditional simulation workflows. This is accomplished by modeling the underlying function of scientific simulations, removing the need to run expensive simulations. Beyond just mapping from input parameter to output, surrogates have also been shown useful for inverse problems: output to input parameters. Inverse problems can be understood as search, where we aim to find parameters whose surrogate outputs contain a specified feature. Yet finding these parameters can be costly, especially for high-dimensional parameter spaces. Thus, existing surrogate-based solutions primarily focus on finding a small set of matching parameters, in the process overlooking the broader picture of plausible parameters. Our work aims to model and visualize the distribution of possible input parameters that produce a given output feature. To achieve this goal, we aim to address two challenges: (1) the approximation error inherent in the surrogate model and (2) forming the parameter distribution in an interactive manner. We model error via density estimation, reporting high density only if a given parameter configuration is close to training parameters, measured both over the input and output space. Our density estimate is used to form a prior belief on parameters, and when combined with a likelihood on features, gives us an efficient way to sample plausible parameter configurations that generate a target output feature. We demonstrate the usability of our solution through a visualization interface by performing feature-driven parameter analysis over the input parameter space of three simulation datasets. Source code is available at https://github.com/matthewberger/seeing-the-many
中文:神经代理模型通过近似科学函数替代传统模拟,有效解决逆问题并可视化生成特定输出特征的输入参数分布,同时处理近似误差并支持交互式分析。
English: Neural surrogate models offer an efficient alternative to traditional simulations by approximating scientific functions, enabling inverse problem solving and visualizing the distribution of input parameters that produce specific output features while addressing approximation errors and enabling interactive analysis.
Authors:Ronghao Lin, Shuai Shen, Weipeng Hu, Qiaolin He, Aolin Xiong, Li Huang, Haifeng Hu, Yap-peng Tan
Abstract:
Multimodal Empathetic Response Generation (MERG) is crucial for building emotionally intelligent human-computer interactions. Although large language models (LLMs) have improved text-based ERG, challenges remain in handling multimodal emotional content and maintaining identity consistency. Thus, we propose E3RG, an Explicit Emotion-driven Empathetic Response Generation System based on multimodal LLMs which decomposes MERG task into three parts: multimodal empathy understanding, empathy memory retrieval, and multimodal response generation. By integrating advanced expressive speech and video generative models, E3RG delivers natural, emotionally rich, and identity-consistent responses without extra training. Experiments validate the superiority of our system on both zero-shot and few-shot settings, securing Top-1 position in the Avatar-based Multimodal Empathy Challenge on ACM MM 25. Our code is available at https://github.com/RH-Lin/E3RG.
中文摘要:E3RG是一个基于多模态大语言模型的显式情感驱动系统,通过将共情响应生成分解为理解、记忆和生成三阶段,无需额外训练即可产生自然且情感一致的多模态回应,并在权威评测中取得最佳成绩。
English Summary: E3RG is an explicit emotion-driven system that enhances multimodal empathetic response generation by decomposing it into empathy understanding, memory retrieval, and response generation, achieving top performance without additional training.
Authors:Chi-Jung Lee, Jiaxin Li, Tianhong Catherine Yu, Ruidong Zhang, Vipin Gunda, François Guimbretière, Cheng Zhang
Abstract:
As computing devices become increasingly integrated into daily life, there is a growing need for intuitive, always-available interaction methods, even when users' hands are occupied. In this paper, we introduce Grab-n-Go, the first wearable device that leverages active acoustic sensing to recognize subtle hand microgestures while holding various objects. Unlike prior systems that focus solely on free-hand gestures or basic hand-object activity recognition, Grab-n-Go simultaneously captures information about hand microgestures, grasping poses, and object geometries using a single wristband, enabling the recognition of fine-grained hand movements occurring within activities involving occupied hands. A deep learning framework processes these complex signals to identify 30 distinct microgestures, with 6 microgestures for each of the 5 grasping poses. In a user study with 10 participants and 25 everyday objects, Grab-n-Go achieved an average recognition accuracy of 92.0%. A follow-up study further validated Grab-n-Go's robustness against 10 more challenging, deformable objects. These results underscore the potential of Grab-n-Go to provide seamless, unobtrusive interactions without requiring modifications to existing objects. The complete dataset, comprising data from 18 participants performing 30 microgestures with 35 distinct objects, is publicly available at https://github.com/cjlisalee/Grab-n-Go_Data with the DOI: https://doi.org/10.7298/7kbd-vv75.
中文摘要:Grab-n-Go是一种新型腕戴设备,通过主动声学传感技术,能在持握物体时精确识别手部微手势,在用户研究中实现了92%的识别准确率。
English Summary: Grab-n-Go is a novel wrist-worn device that uses active acoustic sensing to accurately recognize hand microgestures while holding objects, achieving 92% recognition accuracy in user studies.
Authors:Ojas Shirekar, Wim Pouw, Chenxu Hao, Vrushank Phadnis, Thabo Beeler, Chirag Raman
Abstract:
Digital humans are emerging as autonomous agents in multiparty interactions, yet existing evaluation metrics largely ignore contextual coordination dynamics. We introduce a unified, intervention-driven framework for objective assessment of multiparty social behaviour in skeletal motion data, spanning three complementary dimensions: (1) synchrony via Cross-Recurrence Quantification Analysis, (2) temporal alignment via Multiscale Empirical Mode Decompositionbased Beat Consistency, and (3) structural similarity via Soft Dynamic Time Warping. We validate metric sensitivity through three theory-driven perturbations -- gesture kinematic dampening, uniform speech-gesture delays, and prosodic pitch-variance reduction-applied to $\approx 145$ 30-second thin slices of group interactions from the DnD dataset. Mixed-effects analyses reveal predictable, joint-independent shifts: dampening increases CRQA determinism and reduces beat consistency, delays weaken cross-participant coupling, and pitch flattening elevates F0 Soft-DTW costs. A complementary perception study ($N=27$) compares judgments of full-video and skeleton-only renderings to quantify representation effects. Our three measures deliver orthogonal insights into spatial structure, timing alignment, and behavioural variability. Thereby forming a robust toolkit for evaluating and refining socially intelligent agents. Code available on \href{https://github.com/tapri-lab/gig-interveners}{GitHub}.
中文: 本文提出了一种基于干预的统一框架,通过同步性、时间对齐和结构相似性三个互补维度,客观评估骨骼运动数据中的多方社交行为,并利用理论驱动的干扰和感知研究验证了其有效性,为评估社交智能体提供了可靠工具集。
English: This paper introduces an intervention-driven framework to objectively assess multiparty social behavior in skeletal motion data through three complementary metrics—synchrony, temporal alignment, and structural similarity—validated via theory-driven perturbations and perceptual studies, forming a robust toolkit for evaluating socially intelligent agents.
Authors:Helbert Paat, Guohao Shen
Abstract:
Decision support systems are designed to assist human experts in classification tasks by providing conformal prediction sets derived from a pre-trained model. This human-AI collaboration has demonstrated enhanced classification performance compared to using either the model or the expert independently. In this study, we focus on the selection of instance-specific experts from a pool of multiple human experts, contrasting it with existing research that typically focuses on single-expert scenarios. We characterize the conditions under which multiple experts can benefit from the conformal sets. With the insight that only certain experts may be relevant for each instance, we explore the problem of subset selection and introduce a greedy algorithm that utilizes conformal sets to identify the subset of expert predictions that will be used in classifying an instance. This approach is shown to yield better performance compared to naive methods for human subset selection. Based on real expert predictions from the CIFAR-10H and ImageNet-16H datasets, our simulation study indicates that our proposed greedy algorithm achieves near-optimal subsets, resulting in improved classification performance among multiple experts.
Chinese: 本研究提出了一种贪心算法,利用保形预测集从多位专家中优化选择子集进行分类,在CIFAR-10H和ImageNet-16H数据集上的模拟实验表明,该方法优于简单选择策略并提升了分类性能。
English: This study introduces a greedy algorithm that leverages conformal prediction sets to optimally select subsets of human experts for classification tasks, demonstrating improved performance over naive methods in simulations using CIFAR-10H and ImageNet-16H datasets.
Authors:Connor Bailey, Michael Gleicher
Abstract:
We explore the effects of data and design considerations through the example case of part-to-whole data relationships. Standard part-to-whole representations like pie charts and stacked bar charts make the relationships of parts to the whole explicit. Value estimation in these charts benefits from two perceptual mechanisms: anchoring, where the value is close to a reference value with an easily recognized shape, and alignment where the beginning or end of the shape is aligned with a marker. In an online study, we explore how data and design factors such as value, position, and encoding together impact these effects in making estimations in part-to-whole charts. The results show how salient values and alignment to positions on a scale affect task performance. This demonstrates the need for informed visualization design based around how data properties and design factors affect perceptual mechanisms.
Chinese: 本研究探讨了饼图等部分与整体可视化中数据属性和设计元素如何通过锚定和对齐等感知机制影响数值估计,强调了基于感知原理进行可视化设计对提升任务性能的重要性。
English: This study examines how data properties and design elements in part-to-whole visualizations like pie charts influence value estimation through perceptual mechanisms such as anchoring and alignment, highlighting the importance of informed design choices for optimal task performance.
Authors:Zheng Lian
Abstract:
Open-Vocabulary Multimodal Emotion Recognition (OV-MER) aims to predict emotions without being constrained by predefined label spaces, enabling fine-grained and human-like emotion understanding. Unlike traditional discriminative methods, OV-MER leverages generative models, such as large language models (LLMs) with extensive vocabularies, to capture the full spectrum of emotions. Previous approaches (like AffectGPT) primarily rely on token-level loss for training. However, this objective does not align with the emotion wheel (EW)-based evaluation metrics used in OV-MER. Unfortunately, EW-based metrics cannot be directly optimized via gradient backpropagation. In this paper, we propose AffectGPT-R1, a reinforcement learning framework that directly optimizes performance on EW-based metrics. Specifically, we treat these metrics as the reward function and employ Group Relative Policy Optimization (GRPO) to maximize rewards. Experimental results demonstrate that AffectGPT-R1 achieves significant improvements on OV-MER. We hope this work advances the field of multimodal emotion recognition. Our code will be publicly available at:https://github.com/zeroQiaoba/AffectGPT.
Chinese: 提出的AffectGPT-R1框架通过强化学习直接优化基于情绪轮的评估指标,在开放词汇多模态情绪识别任务中取得了显著提升。
English: The proposed AffectGPT-R1 framework uses reinforcement learning to directly optimize emotion wheel-based metrics, achieving significant improvements in open-vocabulary multimodal emotion recognition.
Authors:Jiankai Tang, Meng Kang, Yiru Zhang, Kegang Wang, Daniel Mcduff, Xin Liu, Yuanchun Shi, Yuntao Wang
Abstract:
Cardiorespiratory coupling (CRC) captures the dynamic interaction between the cardiac and respiratory systems--an interaction strengthened by physical exercise and linked to improved physiological function. We examined CRC at high altitude in two states, rest and post-exercise recovery, and found significant differences (p < 0.05). Quantitative analysis revealed that recovery involved more frequent yet less stable episodes of synchronization between respiration and pulse. Furthermore, we explored the feasibility of non-contact CRC measurement with remote photoplethysmography (rPPG), observing a strong correlation with oximeter-based metrics (Pearson r = 0.96). These findings highlight the potential of CRC as a sensitive marker for autonomic regulation and its future application in contactless monitoring. Source code is available at GitHub: https://github.com/McJackTang/CRC.
中文: 本研究表明心肺耦合(CRC)可作为自主神经调节的敏感指标,在高海拔运动后恢复期表现出更频繁但不稳定的同步性,并验证了使用远程光电容积描记法进行非接触式CRC测量的可行性,与传统方法高度相关。
English: This study demonstrates that cardiorespiratory coupling (CRC) serves as a sensitive indicator of autonomic regulation, with recovery after exercise at high altitude showing more frequent but less stable synchronization, and validates non-contact CRC measurement using remote photoplethysmography as highly correlated with traditional methods.
Authors:Jan Simson
Abstract:
Interactive data visualization is a major part of modern exploratory data analysis, with web-based technologies enabling a rich ecosystem of both specialized and general tools. However, current visualization tools often lack support for transformation or wrangling of data and are forced to re-implement their own solutions to load and ingest data. This redundancy creates substantial development overhead for tool creators, steeper learning curves for users who must master different data handling interfaces across tools and a degraded user experience as data handling is usually seen as an after-thought.
We propose a modular approach that separates data wrangling and loading capabilities from visualization components. This architecture allows visualization tools to concentrate on their core strengths while providing the opportunity to develop a unified, powerful interface for data handling. An additional benefit of this approach is that it allows for multiple tools to exist and be used side by side. We demonstrate the feasibility of this approach by building an early prototype using web technologies to encapsulate visualization tools and manage data flow between them.
We discuss future research directions, including downstream integrations with other tooling, such as IDEs, literate programming notebooks and applications, as well as incorporation of new technologies for efficient data transformations. We seek input from the community to better understand the requirements towards this approach.
中文摘要:该摘要提出了一种将数据整理与可视化工具分离的模块化架构,通过基于网络的原型验证了其可行性,旨在减少冗余并提升用户体验。
English Summary: The abstract proposes a modular architecture that separates data wrangling from visualization tools to reduce redundancy and improve user experience, demonstrating its feasibility through a web-based prototype.
Authors:Kazushi Kato, Koji Inoue, Divesh Lala, Keiko Ochi, Tatsuya Kawahara
Abstract:
In human dialogue, nonverbal information such as nodding and facial expressions is as crucial as verbal information, and spoken dialogue systems are also expected to express such nonverbal behaviors. We focus on nodding, which is critical in an attentive listening system, and propose a model that predicts both its timing and type in real time. The proposed model builds on the voice activity projection (VAP) model, which predicts voice activity from both listener and speaker audio. We extend it to prediction of various types of nodding in a continuous and real-time manner unlike conventional models. In addition, the proposed model incorporates multi-task learning with verbal backchannel prediction and pretraining on general dialogue data. In the timing and type prediction task, the effectiveness of multi-task learning was significantly demonstrated. We confirmed that reducing the processing rate enables real-time operation without a substantial drop in accuracy, and integrated the model into an avatar attentive listening system. Subjective evaluations showed that it outperformed the conventional method, which always does nodding in sync with verbal backchannel. The code and trained models are available at https://github.com/MaAI-Kyoto/MaAI.
中文摘要:本研究提出了一种实时点头预测模型,通过扩展VAP模型并采用多任务学习,能连续预测点头时机与类型,在专注倾听系统中表现优于传统同步点头方法。
English Summary: This study introduces a real-time nodding prediction model for attentive listening systems, which extends the VAP model to continuously predict nodding timing and types using multi-task learning and achieves better performance than conventional methods.
Authors:Harry Shomer, Jiejun Xu
Abstract:
Label placement is a critical aspect of map design, serving as a form of spatial annotation that directly impacts clarity and interpretability. Despite its importance, label placement remains largely manual and difficult to scale, as existing automated systems struggle to integrate cartographic conventions, adapt to context, or interpret labeling instructions. In this work, we introduce a new paradigm for automatic label placement (ALP) that formulates the task as a data editing problem and leverages large language models (LLMs) for context-aware spatial annotation. To support this direction, we curate MAPLE, the first known benchmarking dataset for evaluating ALP on real-world maps, encompassing diverse landmark types and label placement annotations from open-source data. Our method retrieves labeling guidelines relevant to each landmark type leveraging retrieval-augmented generation (RAG), integrates them into prompts, and employs instruction-tuned LLMs to generate ideal label coordinates. We evaluate four open-source LLMs on MAPLE, analyzing both overall performance and generalization across different types of landmarks. This includes both zero-shot and instruction-tuned performance. Our results demonstrate that LLMs, when guided by structured prompts and domain-specific retrieval, can learn to perform accurate spatial edits, aligning the generated outputs with expert cartographic standards. Overall, our work presents a scalable framework for AI-assisted map finishing and demonstrates the potential of foundation models in structured data editing tasks. The code and data can be found at https://github.com/HarryShomer/MAPLE.
中文摘要:本研究提出了一种利用大型语言模型结合制图规范的新型自动标签放置方法,通过构建MAPLE基准数据集实现了符合专业标准的地图空间标注。
English Summary: This paper introduces a novel automatic label placement method using large language models guided by cartographic rules, achieving expert-level spatial annotations through a curated benchmark dataset called MAPLE.
Authors:Mykola Maslych, Mohammadreza Katebi, Christopher Lee, Yahya Hmaiti, Amirpouya Ghasemaghaei, Christian Pumarada, Janneese Palmer, Esteban Segarra Martinez, Marco Emporio, Warren Snipes, Ryan P. McMahan, Joseph J. LaViola
Abstract:
We investigated the challenges of mitigating response delays in free-form conversations with virtual agents powered by Large Language Models (LLMs) within Virtual Reality (VR). For this, we used conversational fillers, such as gestures and verbal cues, to bridge delays between user input and system responses and evaluate their effectiveness across various latency levels and interaction scenarios. We found that latency above 4 seconds degrades quality of experience, while natural conversational fillers improve perceived response time, especially in high-delay conditions. Our findings provide insights for practitioners and researchers to optimize user engagement whenever conversational systems' responses are delayed by network limitations or slow hardware. We also contribute an open-source pipeline that streamlines deploying conversational agents in virtual environments.
中文: 本研究探讨了在VR环境中使用如手势和言语提示等对话填充物来缓解大语言模型响应延迟的问题,发现超过4秒的延迟会损害用户体验,而自然的填充物能提升感知响应性,并为优化延迟交互提供了见解和开源工具链。
English: This study explores using conversational fillers like gestures and verbal cues to mitigate response delays in VR-based LLM conversations, finding that delays over 4 seconds harm user experience while natural fillers improve perceived responsiveness, with insights and an open-source pipeline provided for optimizing delayed interactions.
Authors:Phuc Truong Loc Nguyen, Thanh Hung Do
Abstract:
AI-assisted gait analysis holds promise for improving Parkinson's Disease (PD) care, but current clinical dashboards lack transparency and offer no meaningful way for clinicians to interrogate or contest AI decisions. We present Con-GaIT (Contestable Gait Interpretation & Tracking), a clinician-centered system that advances Contestable AI through a tightly integrated interface designed for interpretability, oversight, and procedural recourse. Grounded in HCI principles, ConGaIT enables structured disagreement via a novel Contest & Justify interaction pattern, supported by visual explanations, role-based feedback, and traceable justification logs. Evaluated using the Contestability Assessment Score (CAS), the framework achieves a score of 0.970, demonstrating that contestability can be operationalized through human-centered design in compliance with emerging regulatory standards. A demonstration of the framework is available at https://github.com/hungdothanh/Con-GaIT.
中文: Con-GaIT系统以临床医生为核心,通过可解释性设计和争议机制提升帕金森病步态分析的透明度,其人本设计实现了高度可争议性评估分数。
English: Con-GaIT introduces a clinician-centered AI system that enhances gait analysis for Parkinson's Disease by integrating interpretability features and a contestability framework, achieving high contestability scores through human-centered design.
Authors:Liwenhan Xie, Jiayi Zhou, Anyi Rao, Huamin Qu, Xinhuan Shu
Abstract:
Animating metaphoric visualizations brings data to life, enhancing the comprehension of abstract data encodings and fostering deeper engagement. However, creators face significant challenges in designing these animations, such as crafting motions that align semantically with the metaphors, maintaining faithful data representation during animation, and seamlessly integrating interactivity. We propose a human-AI co-creation workflow that facilitates creating animations for SVG-based metaphoric visualizations. Users can initially derive animation clips for data elements from vision-language models (VLMs) and subsequently coordinate their timelines based on entity order, attribute values, spatial layout, or randomness. Our design decisions were informed by a formative study with experienced designers (N=8). We further developed a prototype, DataSway, and conducted a user study (N=14) to evaluate its creativity support and usability. A gallery with 6 cases demonstrates its capabilities and applications in web-based hypermedia. We conclude with implications for future research on bespoke data visualization animation.
Authors:Anushka Debnath, Stephen Cranefield, Emiliano Lorini, Bastin Tony Roy Savarimuthu
Abstract:
In human society, trust is an essential component of social attitude that helps build and maintain long-term, healthy relationships which creates a strong foundation for cooperation, enabling individuals to work together effectively and achieve shared goals. As many human interactions occur through electronic means such as using mobile apps, the potential arises for AI systems to assist users in understanding the social state of their relationships. In this paper we investigate the ability of Large Language Models (LLMs) to reason about trust between two individuals in an environment which requires fostering trust relationships. We also assess whether LLMs are capable of inducing trust by role-playing one party in a trust based interaction and planning actions which can instil trust.
Authors:Chieh-Yun Chen, Min Shi, Gong Zhang, Humphrey Shi
Abstract:
Text-to-Image (T2I) generative models have revolutionized content creation but remain highly sensitive to prompt phrasing, often requiring users to repeatedly refine prompts multiple times without clear feedback. While techniques such as automatic prompt engineering, controlled text embeddings, denoising, and multi-turn generation mitigate these issues, they offer limited controllability, or often necessitate additional training, restricting the generalization abilities. Thus, we introduce T2I-Copilot, a training-free multi-agent system that leverages collaboration between (Multimodal) Large Language Models to automate prompt phrasing, model selection, and iterative refinement. This approach significantly simplifies prompt engineering while enhancing generation quality and text-image alignment compared to direct generation. Specifically, T2I-Copilot consists of three agents: (1) Input Interpreter, which parses the input prompt, resolves ambiguities, and generates a standardized report; (2) Generation Engine, which selects the appropriate model from different types of T2I models and organizes visual and textual prompts to initiate generation; and (3) Quality Evaluator, which assesses aesthetic quality and text-image alignment, providing scores and feedback for potential regeneration. T2I-Copilot can operate fully autonomously while also supporting human-in-the-loop intervention for fine-grained control. On GenAI-Bench, using open-source generation models, T2I-Copilot achieves a VQA score comparable to commercial models RecraftV3 and Imagen 3, surpasses FLUX1.1-pro by 6.17% at only 16.59% of its cost, and outperforms FLUX.1-dev and SD 3.5 Large by 9.11% and 6.36%. Code will be released at: https://github.com/SHI-Labs/T2I-Copilot.
中文: T2I-Copilot是一种免训练的多智能体系统,通过多模态大语言模型协作自动优化提示词和选择模型,在显著提升生成质量与图文对齐度的同时大幅降低成本。
English: T2I-Copilot is a training-free multi-agent system that automates prompt engineering and model selection through collaboration between multimodal large language models, significantly improving generation quality and text-image alignment while reducing costs.
Authors:Zheng Wei, Hongtao Wu, Lvmin Zhang, Xian Xu, Yefeng Zheng, Pan Hui, Maneesh Agrawala, Huamin Qu, Anyi Rao
Abstract:
Effective communication between directors and cinematographers is fundamental in film production, yet traditional approaches relying on visual references and hand-drawn storyboards often lack the efficiency and precision necessary during pre-production. We present CineVision, an AI-driven platform that integrates scriptwriting with real-time visual pre-visualization to bridge this communication gap. By offering dynamic lighting control, style emulation based on renowned filmmakers, and customizable character design, CineVision enables directors to convey their creative vision with heightened clarity and rapidly iterate on scene composition. In a 24-participant lab study, CineVision yielded shorter task times and higher usability ratings than two baseline methods, suggesting a potential to ease early-stage communication and accelerate storyboard drafts under controlled conditions. These findings underscore CineVision's potential to streamline pre-production processes and foster deeper creative synergy among filmmaking teams, particularly for new collaborators. Our code and demo are available at https://github.com/TonyHongtaoWu/CineVision.
中文: CineVision 是一个人工智能驱动的平台,通过将剧本创作与实时可视化预览相结合,提供动态灯光控制和风格模拟功能,有效改善导演与摄影师之间的沟通,提升前期制作效率和创意协作。
English: CineVision is an AI-driven platform that enhances director-cinematographer communication by integrating scriptwriting with real-time visual pre-visualization, featuring dynamic lighting and style emulation to streamline pre-production and improve creative synergy.
Authors:Parsa Vares, Ãloi Durant, Jun Pang, Nicolas Médoc, Mohammad Ghoniem
Abstract:
Thompson Sampling (TS) and its variants are powerful Multi-Armed Bandit algorithms used to balance exploration and exploitation strategies in active learning. Yet, their probabilistic nature often turns them into a "black box", hindering debugging and trust. We introduce TS-Insight, a visual analytics tool explicitly designed to shed light on the internal decision mechanisms of Thompson Sampling-based algorithms, for model developers. It comprises multiple plots, tracing for each arm the evolving posteriors, evidence counts, and sampling outcomes, enabling the verification, diagnosis, and explainability of exploration/exploitation dynamics. This tool aims at fostering trust and facilitating effective debugging and deployment in complex binary decision-making scenarios especially in sensitive domains requiring interpretable decision-making.
中文: TS-Insight是一款可视化分析工具,通过多图展示汤普森采样算法的内部决策机制,增强信任并促进在敏感领域中的有效调试。
English: TS-Insight is a visual analytics tool that reveals the internal decision mechanisms of Thompson Sampling algorithms through multiple plots, enhancing trust and enabling effective debugging in sensitive domains.
Authors:Jovana Kondic, Pengyuan Li, Dhiraj Joshi, Zexue He, Shafiq Abedin, Jennifer Sun, Ben Wiesel, Eli Schwartz, Ahmed Nassar, Bo Wu, Assaf Arbelle, Aude Oliva, Dan Gutfreund, Leonid Karlinsky, Rogerio Feris
Abstract:
Chart-to-code reconstruction -- the task of recovering executable plotting scripts from chart images -- provides important insights into a model's ability to ground data visualizations in precise, machine-readable form. Yet many existing multimodal benchmarks largely focus primarily on answering questions about charts or summarizing them. To bridge this gap, we present ChartGen, a fully-automated pipeline for code-guided synthetic chart generation. Starting from seed chart images, ChartGen (i) prompts a vision-language model (VLM) to reconstruct each image into a python script, and (ii) iteratively augments that script with a code-oriented large language model (LLM). Using ChartGen, we create 222.5K unique chart-image code pairs from 13K seed chart images, and present an open-source synthetic chart dataset covering 27 chart types, 11 plotting libraries, and multiple data modalities (image, code, text, CSV, DocTags). From this corpus, we curate a held-out chart-to-code evaluation subset of 4.3K chart image-code pairs, and evaluate six open-weight VLMs (3B - 26B parameters), highlighting substantial room for progress. We release the pipeline, prompts, and the dataset to help accelerate efforts towards robust chart understanding and vision-conditioned code generation: https://github.com/SD122025/ChartGen/
中文:本文提出了ChartGen,一个自动化生成图表-代码对的流程,旨在填补多模态基准在图表到代码重建任务上的空白,通过创建包含多种图表类型和绘图库的数据集,并评估了多个视觉语言模型的性能。
English: This paper introduces ChartGen, an automated pipeline that generates synthetic chart-image code pairs to advance chart-to-code reconstruction, addressing a gap in multimodal benchmarks by creating a comprehensive dataset and evaluating several vision-language models.
Authors:Xuetian Chen, Yinghao Chen, Xinfeng Yuan, Zhuo Peng, Lu Chen, Yuekeng Li, Zhoujia Zhang, Yingqian Huang, Leyan Huang, Jiaqing Liang, Tianbao Xie, Zhiyong Wu, Qiushi Sun, Biqing Qi, Bowen Zhou
Abstract:
Computer-using agents have shown strong potential to boost human productivity and enable new application forms across platforms. While recent advances have led to usable applications, existing benchmarks fail to account for the internal task heterogeneity and the corresponding agent capabilities, as well as their alignment with actual user demands-hindering both targeted capability development and the reliable transition of research progress into practical deployment. To bridge the gap, we present OS-MAP, a benchmark for daily computer-using automation that organizes its 416 realistic tasks across 15 applications along two key dimensions: a five-level taxonomy of automation and a generalization scope derived from a real-world user demand hierarchy. To enable fine-grained analysis of required capabilities and alignment with real-world scenarios, OS-MAP evaluates agents along two dimensions: automation level across a five-level taxonomy, and generalization scope across a demand hierarchy. This design captures varying levels of required agent autonomy and generalization, forming a performance-generalization evaluation matrix for structured and comprehensive assessment. Experiments show that even State-of-the-Art agents with VLM backbones struggle with higher-level tasks involving perception, reasoning, and coordination-highlighting the need for a deeper understanding of current strengths and limitations to drive the future progress in computer-using agents research and deployment. All code, environments, baselines, and data are publicly available at https://github.com/OS-Copilot/OS-Map.
中文: 计算机使用代理虽能提升生产力,但现有基准未能匹配实际需求,因此我们推出OS-MAP基准,通过自动化分级和泛化范围评估代理能力,揭示其在高阶任务中的不足,以推动研究与应用发展。
English: Computer-using agents show promise for productivity but face challenges in aligning capabilities with real-world tasks, prompting the introduction of OS-MAP, a benchmark that evaluates agents across automation levels and generalization scopes to address these gaps and guide future development.
Authors:Chenyu Su, Weiwei Shang, Chen Qian, Fei Zhang, Shuang Cong
Abstract:
Semantics-driven 3D spatial constraints align highlevel semantic representations with low-level action spaces, facilitating the unification of task understanding and execution in robotic manipulation. The synergistic reasoning of Multimodal Large Language Models (MLLMs) and Vision Foundation Models (VFMs) enables cross-modal 3D spatial constraint construction. Nevertheless, existing methods have three key limitations: (1) coarse semantic granularity in constraint modeling, (2) lack of real-time closed-loop planning, (3) compromised robustness in semantically diverse environments. To address these challenges, we propose ReSem3D, a unified manipulation framework for semantically diverse environments, leveraging the synergy between VFMs and MLLMs to achieve fine-grained visual grounding and dynamically constructs hierarchical 3D spatial constraints for real-time manipulation. Specifically, the framework is driven by hierarchical recursive reasoning in MLLMs, which interact with VFMs to automatically construct 3D spatial constraints from natural language instructions and RGB-D observations in two stages: part-level extraction and region-level refinement. Subsequently, these constraints are encoded as real-time optimization objectives in joint space, enabling reactive behavior to dynamic disturbances. Extensive simulation and real-world experiments are conducted in semantically rich household and sparse chemical lab environments. The results demonstrate that ReSem3D performs diverse manipulation tasks under zero-shot conditions, exhibiting strong adaptability and generalization. Code and videos are available at https://github.com/scy-v/ReSem3D and https://resem3d.github.io.
中文: ReSem3D框架通过多模态AI模型的协同作用,从自然语言指令构建精细化的3D空间约束,实现在多样化环境中的实时自适应机器人操作。
English: ReSem3D is a robotic manipulation framework that leverages multimodal AI models to create fine-grained 3D spatial constraints from natural language, enabling real-time adaptive task execution in diverse environments.
Authors:Xiaofeng Mao, Shaoheng Lin, Zhen Li, Chuanhao Li, Wenshuo Peng, Tong He, Jiangmiao Pang, Mingmin Chi, Yu Qiao, Kaipeng Zhang
Abstract:
Yume aims to use images, text, or videos to create an interactive, realistic, and dynamic world, which allows exploration and control using peripheral devices or neural signals. In this report, we present a preview version of \method, which creates a dynamic world from an input image and allows exploration of the world using keyboard actions. To achieve this high-fidelity and interactive video world generation, we introduce a well-designed framework, which consists of four main components, including camera motion quantization, video generation architecture, advanced sampler, and model acceleration. First, we quantize camera motions for stable training and user-friendly interaction using keyboard inputs. Then, we introduce the Masked Video Diffusion Transformer~(MVDT) with a memory module for infinite video generation in an autoregressive manner. After that, training-free Anti-Artifact Mechanism (AAM) and Time Travel Sampling based on Stochastic Differential Equations (TTS-SDE) are introduced to the sampler for better visual quality and more precise control. Moreover, we investigate model acceleration by synergistic optimization of adversarial distillation and caching mechanisms. We use the high-quality world exploration dataset \sekai to train \method, and it achieves remarkable results in diverse scenes and applications. All data, codebase, and model weights are available on https://github.com/stdstu12/YUME. Yume will update monthly to achieve its original goal. Project page: https://stdstu12.github.io/YUME-Project/.
中文: Yume是一个通过图像生成动态虚拟世界的交互系统,其框架融合了相机运动量化、视频扩散变换器和抗伪影机制,支持用户通过键盘操作探索虚拟环境。
English: Yume is an interactive system that generates dynamic virtual worlds from images, enabling exploration via keyboard controls through a framework integrating camera motion quantization, video diffusion transformers, and artifact reduction techniques.
Authors:Jiahao Tang, Youjun Li, Xiangting Fan, Yangxuan Zheng, Siyuan Lu, Xueping Li, Peng Fang, Chenxi Li, Zi-Gang Huang
Abstract:
Emotion recognition based on electroencephalography (EEG) holds significant promise for affective brain-computer interfaces (aBCIs). However, its practical deployment faces challenges due to the variability within inter-subject and the scarcity of labeled data in target domains. To overcome these limitations, we propose SDC-Net, a novel Semantic-Dynamic Consistency domain adaptation network for fully label-free cross-subject EEG emotion recognition. First, we introduce a Same-Subject Same-Trial Mixup strategy that generates augmented samples through intra-trial interpolation, enhancing data diversity while explicitly preserving individual identity to mitigate label ambiguity. Second, we construct a dynamic distribution alignment module within the Reproducing Kernel Hilbert Space (RKHS), jointly aligning marginal and conditional distributions through multi-objective kernel mean embedding, and leveraging a confidence-aware pseudo-labeling strategy to ensure stable adaptation. Third, we propose a dual-domain similarity consistency learning mechanism that enforces cross-domain structural constraints based on latent pairwise similarities, facilitating semantic boundary learning without reliance on temporal synchronization or label priors. To validate the effectiveness and robustness of the proposed SDC-Net, extensive experiments are conducted on three widely used EEG benchmark datasets: SEED, SEED-IV, and FACED. Comparative results against existing unsupervised domain adaptation methods demonstrate that SDC-Net achieves state-of-the-art performance in emotion recognition under both cross-subject and cross-session conditions. This advancement significantly improves the accuracy and generalization capability of emotion decoding, laying a solid foundation for real-world applications of personalized aBCIs. The source code is available at: https://github.com/XuanSuTrum/SDC-Net.
Chinese: 提出的SDC-Net通过混合增强策略、动态分布对齐和双域一致性学习,实现了无需标注数据的跨被试脑电情绪识别,在多个基准数据集上取得了最优性能。
English: The proposed SDC-Net introduces a domain adaptation framework for cross-subject EEG emotion recognition that employs mixup strategies, dynamic distribution alignment, and dual-domain consistency learning to achieve state-of-the-art performance without requiring labeled target data.
Authors:Ruijie Zhu, Mulin Yu, Linning Xu, Lihan Jiang, Yixuan Li, Tianzhu Zhang, Jiangmiao Pang, Bo Dai
Abstract:
3D Gaussian Splatting is renowned for its high-fidelity reconstructions and real-time novel view synthesis, yet its lack of semantic understanding limits object-level perception. In this work, we propose ObjectGS, an object-aware framework that unifies 3D scene reconstruction with semantic understanding. Instead of treating the scene as a unified whole, ObjectGS models individual objects as local anchors that generate neural Gaussians and share object IDs, enabling precise object-level reconstruction. During training, we dynamically grow or prune these anchors and optimize their features, while a one-hot ID encoding with a classification loss enforces clear semantic constraints. We show through extensive experiments that ObjectGS not only outperforms state-of-the-art methods on open-vocabulary and panoptic segmentation tasks, but also integrates seamlessly with applications like mesh extraction and scene editing. Project page: https://ruijiezhu94.github.io/ObjectGS_page
中文: ObjectGS提出了一种对象感知框架,通过将语义理解融入3D高斯泼溅技术,在分割任务和场景编辑等应用中实现了更优的性能。
English: ObjectGS introduces an object-aware framework that enhances 3D Gaussian Splatting by incorporating semantic understanding, enabling superior performance in segmentation tasks and practical applications like scene editing.
Authors:Kobi Hackenburg, Ben M. Tappin, Luke Hewitt, Ed Saunders, Sid Black, Hause Lin, Catherine Fist, Helen Margetts, David G. Rand, Christopher Summerfield
Abstract:
There are widespread fears that conversational AI could soon exert unprecedented influence over human beliefs. Here, in three large-scale experiments (N=76,977), we deployed 19 LLMs-including some post-trained explicitly for persuasion-to evaluate their persuasiveness on 707 political issues. We then checked the factual accuracy of 466,769 resulting LLM claims. Contrary to popular concerns, we show that the persuasive power of current and near-future AI is likely to stem more from post-training and prompting methods-which boosted persuasiveness by as much as 51% and 27% respectively-than from personalization or increasing model scale. We further show that these methods increased persuasion by exploiting LLMs' unique ability to rapidly access and strategically deploy information and that, strikingly, where they increased AI persuasiveness they also systematically decreased factual accuracy.
Chinese: 研究表明,当前对话式AI的说服力主要源于后训练和提示技术,这些方法虽大幅提升了说服效果,却系统性地降低了事实准确性,而非个性化或模型规模扩大所致。
English: Current research reveals that the persuasive power of conversational AI primarily stems from post-training and prompting techniques, which significantly enhance persuasiveness while simultaneously reducing factual accuracy, rather than from personalization or model scaling.
Authors:Lotfi El Hafi, Kazuma Onishi, Shoichi Hasegawa, Akira Oyama, Tomochika Ishikawa, Masashi Osada, Carl Tornberg, Ryoma Kado, Kento Murata, Saki Hashimoto, Sebastian Carrera Villalobos, Akira Taniguchi, Gustavo Alfonso Garcia Ricardez, Yoshinobu Hagiwara, Tatsuya Aoki, Kensuke Iwata, Takato Horii, Yukiko Horikawa, Takahiro Miyashita, Tadahiro Taniguchi, Hiroshi Ishiguro
Abstract:
Cybernetic avatars (CAs) are key components of an avatar-symbiotic society, enabling individuals to overcome physical limitations through virtual agents and robotic assistants. While semi-autonomous CAs intermittently require human teleoperation and supervision, the deployment of fully autonomous CAs remains a challenge. This study evaluates public perception and potential social impacts of fully autonomous CAs for physical support in daily life. To this end, we conducted a large-scale demonstration and survey during Avatar Land, a 19-day public event in Osaka, Japan, where fully autonomous robotic CAs, alongside semi-autonomous CAs, performed daily object retrieval tasks. Specifically, we analyzed responses from 2,285 visitors who engaged with various CAs, including a subset of 333 participants who interacted with fully autonomous CAs and shared their perceptions and concerns through a survey questionnaire. The survey results indicate interest in CAs for physical support in daily life and at work. However, concerns were raised regarding task execution reliability. In contrast, cost and human-like interaction were not dominant concerns. Project page: https://lotfielhafi.github.io/FACA-Survey/.
中文: 本研究评估了公众对全自主控制论化身在日常物理支持方面的兴趣与担忧,发现人们对其应用抱有热情,但对其任务执行可靠性存在显著忧虑,而成本和拟人化交互则非主要关注点。
English: This study assesses public interest and concerns about fully autonomous cybernetic avatars for daily physical support, finding enthusiasm for their use but significant worries about reliability, while cost and human-like interaction are less prominent issues.
Authors:Kuangshi Ai, Kaiyuan Tang, Chaoli Wang
Abstract:
Traditional volume visualization (VolVis) methods, like direct volume rendering, suffer from rigid transfer function designs and high computational costs. Although novel view synthesis approaches enhance rendering efficiency, they require additional learning effort for non-experts and lack support for semantic-level interaction. To bridge this gap, we propose NLI4VolVis, an interactive system that enables users to explore, query, and edit volumetric scenes using natural language. NLI4VolVis integrates multi-view semantic segmentation and vision-language models to extract and understand semantic components in a scene. We introduce a multi-agent large language model architecture equipped with extensive function-calling tools to interpret user intents and execute visualization tasks. The agents leverage external tools and declarative VolVis commands to interact with the VolVis engine powered by 3D editable Gaussians, enabling open-vocabulary object querying, real-time scene editing, best-view selection, and 2D stylization. We validate our system through case studies and a user study, highlighting its improved accessibility and usability in volumetric data exploration. We strongly recommend readers check our case studies, demo video, and source code at https://nli4volvis.github.io/.
Authors:Paulo Salem, Robert Sim, Christopher Olsen, Prerit Saxena, Rafael Barcelos, Yi Ding
Abstract:
Recent advances in Large Language Models (LLM) have led to a new class of autonomous agents, renewing and expanding interest in the area. LLM-powered Multiagent Systems (MAS) have thus emerged, both for assistive and simulation purposes, yet tools for realistic human behavior simulation -- with its distinctive challenges and opportunities -- remain underdeveloped. Existing MAS libraries and tools lack fine-grained persona specifications, population sampling facilities, experimentation support, and integrated validation, among other key capabilities, limiting their utility for behavioral studies, social simulation, and related applications. To address these deficiencies, in this work we introduce TinyTroupe, a simulation toolkit enabling detailed persona definitions (e.g., nationality, age, occupation, personality, beliefs, behaviors) and programmatic control via numerous LLM-driven mechanisms. This allows for the concise formulation of behavioral problems of practical interest, either at the individual or group level, and provides effective means for their solution. TinyTroupe's components are presented using representative working examples, such as brainstorming and market research sessions, thereby simultaneously clarifying their purpose and demonstrating their usefulness. Quantitative and qualitative evaluations of selected aspects are also provided, highlighting possibilities, limitations, and trade-offs. The approach, though realized as a specific Python implementation, is meant as a novel conceptual contribution, which can be partially or fully incorporated in other contexts. The library is available as open source at https://github.com/microsoft/tinytroupe.
Chinese Summary: TinyTroupe作为一种新型模拟工具包,通过基于大语言模型的驱动机制实现精细人物角色定义和程序化控制,解决了现有多智能体系统在行为模拟方面的不足,为社会科学研究和市场分析等应用提供了有效解决方案。
English Summary: TinyTroupe is a new simulation toolkit that addresses the limitations of existing multiagent systems by enabling detailed persona specifications and programmatic control through LLM-driven mechanisms, facilitating realistic behavioral simulations for applications like social studies and market research.
Authors:Changli Wang, Rui Wu, Fang Yin
Abstract:
Human emotions are complex, with sarcasm being a subtle and distinctive form. Despite progress in sarcasm research, sarcasm generation remains underexplored, primarily due to the overreliance on textual modalities and the neglect of visual cues, as well as the mismatch between image content and sarcastic intent in existing datasets. In this paper, we introduce M2SaG, a multimodal sarcasm generation dataset with 4,970 samples, each containing an image, a sarcastic text, and a sarcasm target. To benchmark M2SaG, we propose ViSP, a generation framework that integrates Proximal Policy Optimization (PPO) and contrastive learning. PPO utilizes reward scores from DIP to steer the generation of sarcastic texts, while contrastive learning encourages the model to favor outputs with higher reward scores. These strategies improve overall generation quality and produce texts with more pronounced sarcastic intent. We evaluate ViSP across five metric sets and find it surpasses all baselines, including large language models, underscoring their limitations in sarcasm generation. Furthermore, we analyze the distributions of Sarcasm Scores and Factual Incongruity for both M2SaG and the texts generated by ViSP. The generated texts exhibit higher mean Sarcasm Scores (0.898 vs. 0.770) and Factual Incongruity (0.768 vs. 0.739), demonstrating that ViSP produces higher-quality sarcastic content than the original dataset. % The dataset and code will be publicly available. Our dataset and code will be released at \textit{https://github.com/wclapply/ViSP}.
Chinese: 本文提出了多模态讽刺生成数据集M2SaG和ViSP框架,该框架通过PPO和对比学习提升讽刺文本生成质量,在包括大语言模型在内的基准测试中表现优异。
English: This paper introduces M2SaG, a multimodal sarcasm generation dataset, and ViSP, a framework that enhances sarcastic text generation through PPO and contrastive learning, outperforming baselines including large language models.
Authors:Di Wen, Kunyu Peng, Kailun Yang, Yufan Chen, Ruiping Liu, Junwei Zheng, Alina Roitberg, Danda Pani Paudel, Luc Van Gool, Rainer Stiefelhagen
Abstract:
Human-Object Interaction (HOI) detection is crucial for robot-human assistance, enabling context-aware support. However, models trained on clean datasets degrade in real-world conditions due to unforeseen corruptions, leading to inaccurate prediction. To address this, we introduce the first robustness benchmark for HOI detection, evaluating model resilience under diverse challenges. Despite advances, current models struggle with environmental variability, occlusions, and noise. Our benchmark, RoHOI, includes 20 corruption types based on the HICO-DET and V-COCO datasets and a new robustness-focused metric. We systematically analyze existing models in the HOI field, revealing significant performance drops under corruptions. To improve robustness, we propose a Semantic-Aware Masking-based Progressive Learning (SAMPL) strategy to guide the model to be optimized based on holistic and partial cues, thus dynamically adjusting the model's optimization to enhance robust feature learning. Extensive experiments show that our approach outperforms state-of-the-art methods, setting a new standard for robust HOI detection. Benchmarks, datasets, and code will be made publicly available at https://github.com/Kratos-Wen/RoHOI.
中文: 本研究提出了首个用于人-物交互检测的鲁棒性基准RoHOI,并设计了一种基于语义感知掩码的渐进学习策略,有效提升了模型在现实干扰下的稳健性,性能优于现有方法。
English: The study introduces RoHOI, the first robustness benchmark for Human-Object Interaction detection, and proposes a Semantic-Aware Masking-based Progressive Learning strategy that significantly enhances model resilience against real-world corruptions, outperforming existing methods.
Authors:Evgenii Rudakov, Jonathan Shock, Otto Lappi, Benjamin Ultan Cowley
Abstract:
This paper introduces a SSSUMO, semi-supervised deep learning approach for submovement decomposition that achieves state-of-the-art accuracy and speed. While submovement analysis offers valuable insights into motor control, existing methods struggle with reconstruction accuracy, computational cost, and validation, due to the difficulty of obtaining hand-labeled data. We address these challenges using a semi-supervised learning framework. This framework learns from synthetic data, initially generated from minimum-jerk principles and then iteratively refined through adaptation to unlabeled human movement data. Our fully convolutional architecture with differentiable reconstruction significantly surpasses existing methods on both synthetic and diverse human motion datasets, demonstrating robustness even in high-noise conditions. Crucially, the model operates in real-time (less than a millisecond per input second), a substantial improvement over optimization-based techniques. This enhanced performance facilitates new applications in human-computer interaction, rehabilitation medicine, and motor control studies. We demonstrate the model's effectiveness across diverse human-performed tasks such as steering, rotation, pointing, object moving, handwriting, and mouse-controlled gaming, showing notable improvements particularly on challenging datasets where traditional methods largely fail. Training and benchmarking source code, along with pre-trained model weights, are made publicly available at https://github.com/dolphin-in-a-coma/sssumo.
中文: 本文提出的SSSUMO是一种半监督深度学习框架,用于子运动分解,实现了顶尖的精度和实时处理能力,显著提升了人机交互和运动控制研究中的应用潜力。
English: This paper presents SSSUMO, a semi-supervised deep learning method for submovement decomposition that achieves top-tier accuracy and real-time processing, enhancing applications in human-computer interaction and motor control studies.
Authors:Chengan He, Jorge Alejandro Amador Herrera, Zhixin Shu, Xin Sun, Yao Feng, Sören Pirk, Dominik L. Michels, Meng Zhang, Tuanfeng Y. Wang, Julie Dorsey, Holly Rushmeier, Yi Zhou
Abstract:
We introduce Digital Salon, a comprehensive hair authoring system that supports real-time 3D hair generation, simulation, and rendering. Unlike existing methods that focus on isolated parts of 3D hair modeling and involve a heavy computation process or network training, Digital Salon offers a holistic and interactive system that lowers the technical barriers of 3D hair modeling through natural language-based interaction. The system guides users through four key stages: text-guided hair retrieval, real-time hair simulation, interactive hair refinement, and hair-conditioned image generation. This cohesive workflow makes advanced hair design accessible to users of varying skill levels and dramatically streamlines the creative process in digital media with an intuitive, versatile, and efficient solution for hair modeling. User studies show that our system can outperform traditional hair modeling workflows for rapid prototyping. Furthermore, we provide insights into the benefits of our system with future potential of deploying our system in real salon environments. More details can be found on our project page: https://digital-salon.github.io/.
Authors:Weibing Zheng, Laurah Turner, Jess Kropczynski, Murat Ozer, Seth Overla, Shane Halse
Abstract:
Assisting medical students with clinical reasoning (CR) during clinical scenario training remains a persistent challenge in medical education. This paper presents the design and architecture of the Fuzzy Supervisor Agent (FSA), a novel component for the Multi-Agent Educational Clinical Scenario Simulation (MAECSS) platform. The FSA leverages a Fuzzy Inference System (FIS) to continuously interpret student interactions with specialized clinical agents (e.g., patient, physical exam, diagnostic, intervention) using pre-defined fuzzy rule bases for professionalism, medical relevance, ethical behavior, and contextual distraction. By analyzing student decision-making processes in real-time, the FSA is designed to deliver adaptive, context-aware feedback and provides assistance precisely when students encounter difficulties. This work focuses on the technical framework and rationale of the FSA, highlighting its potential to provide scalable, flexible, and human-like supervision in simulation-based medical education. Future work will include empirical evaluation and integration into broader educational settings. More detailed design and implementation is~\href{https://github.com/2sigmaEdTech/MAS/}{open sourced here}.
Chinese: 模糊监督代理(FSA)作为临床模拟平台的新组件,通过模糊逻辑监控学生互动,在临床推理训练中提供自适应的情境感知反馈。
English: The Fuzzy Supervisor Agent (FSA) is a novel component for clinical simulation platforms that uses fuzzy logic to monitor student interactions and provide adaptive, context-aware feedback during clinical reasoning training.
Authors:Ziyang Miao, Qiyu Sun, Jingyuan Wang, Yuchen Gong, Yaowei Zheng, Shiqi Li, Richong Zhang
Abstract:
Large language models (LLMs) have shown impressive performance on general-purpose tasks, yet adapting them to specific domains remains challenging due to the scarcity of high-quality domain data. Existing data synthesis tools often struggle to extract reliable fine-tuning data from heterogeneous documents effectively. To address this limitation, we propose Easy Dataset, a unified framework for synthesizing fine-tuning data from unstructured documents via an intuitive graphical user interface (GUI). Specifically, Easy Dataset allows users to easily configure text extraction models and chunking strategies to transform raw documents into coherent text chunks. It then leverages a persona-driven prompting approach to generate diverse question-answer pairs using public-available LLMs. Throughout the pipeline, a human-in-the-loop visual interface facilitates the review and refinement of intermediate outputs to ensure data quality. Experiments on a financial question-answering task show that fine-tuning LLMs on the synthesized dataset significantly improves domain-specific performance while preserving general knowledge. The source code and installable package are available at https://github.com/ConardLi/easy-dataset and have garnered over 9,000 GitHub stars.
中文: Easy Dataset框架通过直观的图形界面,让用户能配置文本提取模型和分块策略,将非结构化文档转化为连贯文本块,并采用角色驱动提示方法生成多样化问答对,结合人工监督确保数据质量,实验表明基于该合成数据微调的大语言模型在特定领域任务中性能显著提升。
English: The Easy Dataset framework addresses the challenge of domain adaptation for large language models by providing a unified GUI tool that synthesizes high-quality fine-tuning data from unstructured documents through configurable extraction models and persona-driven prompting, with human oversight ensuring data quality and experimental results showing significant performance improvements in domain-specific tasks.
Authors:Vineet Kumar Rakesh, Soumya Mazumdar, Research Pratim Maity, Sarbajit Pal, Amitabha Das, Tapas Samanta
Abstract:
Talking Head Generation (THG) has emerged as a transformative technology in computer vision, enabling the synthesis of realistic human faces synchronized with image, audio, text, or video inputs. This paper provides a comprehensive review of methodologies and frameworks for talking head generation, categorizing approaches into 2D--based, 3D--based, Neural Radiance Fields (NeRF)--based, diffusion--based, parameter-driven techniques and many other techniques. It evaluates algorithms, datasets, and evaluation metrics while highlighting advancements in perceptual realism and technical efficiency critical for applications such as digital avatars, video dubbing, ultra-low bitrate video conferencing, and online education. The study identifies challenges such as reliance on pre--trained models, extreme pose handling, multilingual synthesis, and temporal consistency. Future directions include modular architectures, multilingual datasets, hybrid models blending pre--trained and task-specific layers, and innovative loss functions. By synthesizing existing research and exploring emerging trends, this paper aims to provide actionable insights for researchers and practitioners in the field of talking head generation. For the complete survey, code, and curated resource list, visit our GitHub repository: https://github.com/VineetKumarRakesh/thg.
中文: 本文全面综述了说话头生成技术,评估了其应用、挑战及未来方向,为研究人员和从业者提供了实用指导。
English: This paper offers a comprehensive review of talking head generation methods, evaluating their applications, challenges, and future directions to guide researchers and practitioners in the field.
Authors:Chi Zhang, Yu Dong, Yang Wang, Yuetong Han, Guihua Shan, Bixia Tang
Abstract:
Circular genome visualizations are essential for exploring structural variants and gene regulation. However, existing tools often require complex scripting and manual configuration, making the process time-consuming, error-prone, and difficult to learn. To address these challenges, we introduce AuraGenome, an LLM-powered framework for rapid, reusable, and scalable generation of multi-layered circular genome visualizations. AuraGenome combines a semantic-driven multi-agent workflow with an interactive visual analytics system. The workflow employs seven specialized LLM-driven agents, each assigned distinct roles such as intent recognition, layout planning, and code generation, to transform raw genomic data into tailored visualizations. The system supports multiple coordinated views tailored for genomic data, offering ring, radial, and chord-based layouts to represent multi-layered circular genome visualizations. In addition to enabling interactions and configuration reuse, the system supports real-time refinement and high-quality report export. We validate its effectiveness through two case studies and a comprehensive user study. AuraGenome is available at: https://github.com/Darius18/AuraGenome.
中文摘要:AuraGenome是一个基于大语言模型的框架,通过多智能体工作流和交互式分析系统自动生成多层环形基因组可视化,无需复杂编程即可快速创建可定制的高质量基因组图谱。
English Summary: AuraGenome is an LLM-powered framework that automates circular genome visualization through a multi-agent workflow and interactive system, enabling rapid generation of customizable multi-layered genomic diagrams without complex scripting.
Authors:Hendric Voss, Stefan Kopp
Abstract:
Generating accurate and realistic virtual human movements in real-time is of high importance for a variety of applications in computer graphics, interactive virtual environments, robotics, and biomechanics. This paper introduces a novel real-time inverse kinematics (IK) solver specifically designed for realistic human-like movement generation. Leveraging the automatic differentiation and just-in-time compilation of TensorFlow, the proposed solver efficiently handles complex articulated human skeletons with high degrees of freedom. By treating forward and inverse kinematics as differentiable operations, our method effectively addresses common challenges such as error accumulation and complicated joint limits in multi-constrained problems, which are critical for realistic human motion modeling. We demonstrate the solver's effectiveness on the SMPLX human skeleton model, evaluating its performance against widely used iterative-based IK algorithms, like Cyclic Coordinate Descent (CCD), FABRIK, and the nonlinear optimization algorithm IPOPT. Our experiments cover both simple end-effector tasks and sophisticated, multi-constrained problems with realistic joint limits. Results indicate that our IK solver achieves real-time performance, exhibiting rapid convergence, minimal computational overhead per iteration, and improved success rates compared to existing methods. The project code is available at https://github.com/hvoss-techfak/JAX-IK
中文: 本文提出了一种新颖的实时逆运动学求解器,利用TensorFlow的自动微分和即时编译技术高效生成逼真人体运动,在收敛速度和成功率方面优于现有方法。
English: This paper presents a novel real-time inverse kinematics solver that uses TensorFlow's automatic differentiation and just-in-time compilation to generate realistic human movements efficiently, outperforming existing methods in convergence speed and success rates.
Authors:Zhuangzhuang Dai, Vincent Gbouna Zakka, Luis J. Manso, Chen Li
Abstract:
Enabling robots to understand human gaze target is a crucial step to allow capabilities in downstream tasks, for example, attention estimation and movement anticipation in real-world human-robot interactions. Prior works have addressed the in-frame target localization problem with data-driven approaches by carefully removing out-of-frame samples. Vision-based gaze estimation methods, such as OpenFace, do not effectively absorb background information in images and cannot predict gaze target in situations where subjects look away from the camera. In this work, we propose a system to address the problem of 360-degree gaze target estimation from an image in generalized visual scenes. The system, named GazeTarget360, integrates conditional inference engines of an eye-contact detector, a pre-trained vision encoder, and a multi-scale-fusion decoder. Cross validation results show that GazeTarget360 can produce accurate and reliable gaze target predictions in unseen scenarios. This makes a first-of-its-kind system to predict gaze targets from realistic camera footage which is highly efficient and deployable. Our source code is made publicly available at: https://github.com/zdai257/DisengageNet.
Chinese: 本文提出了GazeTarget360系统,通过整合眼神接触检测器、预训练视觉编码器和多尺度融合解码器,实现了从图像中进行360度视线目标估计,在未知场景中产生准确预测,超越了OpenFace等现有方法。
English: This paper introduces GazeTarget360, a novel system that enables 360-degree gaze target estimation from images by integrating an eye-contact detector, a pre-trained vision encoder, and a multi-scale-fusion decoder, achieving accurate predictions in unseen scenarios and outperforming prior methods like OpenFace.
Authors:Liujian Tang, Shaokang Dong, Yijia Huang, Minqi Xiang, Hongtao Ruan, Bin Wang, Shuo Li, Zhiheng Xi, Zhihui Cao, Hailiang Pang, Heng Kong, He Yang, Mingxu Chai, Zhilin Gao, Xingyu Liu, Yingnan Fu, Jiaming Liu, Xuanjing Huang, Yu-Gang Jiang, Tao Gui, Qi Zhang, Kang Wang, Yunke Zhang, Yuran Wang
Abstract:
This paper presents MagicGUI, a foundational mobile GUI agent designed to address critical challenges in perception, grounding, and reasoning within real-world mobile GUI environments. The framework is underpinned by following six key components: (1) a comprehensive and accurate dataset, constructed via the scalable GUI Data Pipeline, which aggregates the largest and most diverse GUI-centric multimodal data to date from open-source repositories, automated crawling, and targeted manual annotation; (2) enhanced perception and grounding capabilities, facilitating fine-grained multimodal alignment for UI element referencing, grounding, and screen comprehension; (3) a comprehensive and unified action space, encompassing both fundamental UI operations and complex interactive intents to support human-agent interactions; (4) planning-oriented reasoning mechanisms that enable the model to decompose complex user instructions into sequential actions with explicit intermediate meta-paln reasoning; (5) an iterative two-stage training procedure, combining large-scale continue pre-training on 7.8M samples with reinforcement fine-tuning utilizing a spatially enhanced composite reward and dual filtering strategy; and (6) competitive performance on both the proprietary Magic-RICH benchmark and over a dozen public benchmarks, achieving superior performance across GUI perception and agent tasks, while demonstrating robust generalization and real-world deployment potential in practical mobile GUI scenarios, as detailed in Figure 1.
中文: MagicGUI是一种基础性移动GUI代理,通过六大核心组件解决感知、定位和推理难题,在多个基准测试中表现优异,并展现出强大的实际应用潜力。
English: MagicGUI is a foundational mobile GUI agent that tackles perception, grounding, and reasoning challenges through six key components, including a comprehensive dataset and enhanced capabilities, achieving top performance on benchmarks and demonstrating strong real-world potential.
Authors:Joy Jia Yin Lim, Daniel Zhang-Li, Jifan Yu, Xin Cong, Ye He, Zhiyuan Liu, Huiqin Liu, Lei Hou, Juanzi Li, Bin Xu
Abstract:
Standardized, one-size-fits-all educational content often fails to connect with students' individual backgrounds and interests, leading to disengagement and a perceived lack of relevance. To address this challenge, we introduce PAGE, a novel framework that leverages large language models (LLMs) to automatically personalize educational materials by adapting them to each student's unique context, such as their major and personal interests. To validate our approach, we deployed PAGE in a semester-long intelligent tutoring system and conducted a user study to evaluate its impact in an authentic educational setting. Our findings show that students who received personalized content demonstrated significantly improved learning outcomes and reported higher levels of engagement, perceived relevance, and trust compared to those who used standardized materials. This work demonstrates the practical value of LLM-powered personalization and offers key design implications for creating more effective, engaging, and trustworthy educational experiences.
中文: PAGE框架利用大型语言模型为每位学生量身定制教育内容,相比标准化材料,显著提升了学习效果、参与度和内容相关性感知。
English: The PAGE framework utilizes large language models to tailor educational content to individual students' contexts, significantly enhancing learning outcomes, engagement, and perceived relevance compared to standardized materials.
Authors:Honglu Zhou, Xiangyu Peng, Shrikant Kendre, Michael S. Ryoo, Silvio Savarese, Caiming Xiong, Juan Carlos Niebles
Abstract:
Next-generation AI companions must go beyond general video understanding to resolve spatial and temporal references in dynamic, real-world environments. Existing Video Large Language Models (Video LLMs), while capable of coarse-level comprehension, struggle with fine-grained, spatiotemporal reasoning, especially when user queries rely on time-based event references for temporal anchoring, or gestural cues for spatial anchoring to clarify object references and positions. To bridge this critical gap, we introduce Strefer, a synthetic instruction data generation framework designed to equip Video LLMs with spatiotemporal referring and reasoning capabilities. Strefer produces diverse instruction-tuning data using a data engine that pseudo-annotates temporally dense, fine-grained video metadata, capturing rich spatial and temporal information in a structured manner, including subjects, objects, their locations as masklets, and their action descriptions and timelines. Our approach enhances the ability of Video LLMs to interpret spatial and temporal references, fostering more versatile, space-time-aware reasoning essential for real-world AI companions. Without using proprietary models, costly human annotation, or the need to annotate large volumes of new videos, experimental evaluations show that models trained with data produced by Strefer outperform baselines on tasks requiring spatial and temporal disambiguation. Additionally, these models exhibit enhanced space-time-aware reasoning, establishing a new foundation for perceptually grounded, instruction-tuned Video LLMs.
中文摘要:Strefer是一种合成数据生成框架,通过创建多样化的指令调优数据来增强视频大语言模型的时空推理能力,使其无需昂贵的人工标注即可更准确地解析视频中的空间和时间参照。
English Summary: Strefer is a synthetic data generation framework that enhances Video LLMs' spatiotemporal reasoning by creating diverse instruction-tuning data, enabling them to better interpret spatial and temporal references in videos without costly human annotation.
Authors:Haoming Wang, Haoyang Zou, Huatong Song, Jiazhan Feng, Junjie Fang, Junting Lu, Longxiang Liu, Qinyu Luo, Shihao Liang, Shijue Huang, Wanjun Zhong, Yining Ye, Yujia Qin, Yuwen Xiong, Yuxin Song, Zhiyong Wu, Aoyan Li, Bo Li, Chen Dun, Chong Liu, Daoguang Zan, Fuxing Leng, Hanbin Wang, Hao Yu, Haobin Chen, Hongyi Guo, Jing Su, Jingjia Huang, Kai Shen, Kaiyu Shi, Lin Yan, Peiyao Zhao, Pengfei Liu, Qinghao Ye, Renjie Zheng, Shulin Xin, Wayne Xin Zhao, Wen Heng, Wenhao Huang, Wenqian Wang, Xiaobo Qin, Yi Lin, Youbin Wu, Zehui Chen, Zihao Wang, Baoquan Zhong, Xinchun Zhang, Xujing Li, Yuanfan Li, Zhongkai Zhao, Chengquan Jiang, Faming Wu, Haotian Zhou, Jinlin Pang, Li Han, Qi Liu, Qianli Ma, Siyao Liu, Songhua Cai, Wenqi Fu, Xin Liu, Yaohui Wang, Zhi Zhang, Bo Zhou, Guoliang Li, Jiajun Shi, Jiale Yang, Jie Tang, Li Li, Qihua Han, Taoran Lu, Woyu Lin, Xiaokang Tong, Xinyao Li, Yichi Zhang, Yu Miao, Zhengxuan Jiang, Zili Li, Ziyuan Zhao, Chenxin Li, Dehua Ma, Feng Lin, Ge Zhang, Haihua Yang, Hangyu Guo, Hongda Zhu, Jiaheng Liu, Junda Du, Kai Cai, Kuanye Li, Lichen Yuan, Meilan Han, Minchao Wang, Shuyue Guo, Tianhao Cheng, Xiaobo Ma, Xiaojun Xiao, Xiaolong Huang, Xinjie Chen, Yidi Du, Yilin Chen, Yiwen Wang, Zhaojian Li, Zhenzhu Yang, Zhiyuan Zeng, Chaolin Jin, Chen Li, Hao Chen, Haoli Chen, Jian Chen, Qinghao Zhao, Guang Shi
Abstract:
The development of autonomous agents for graphical user interfaces (GUIs) presents major challenges in artificial intelligence. While recent advances in native agent models have shown promise by unifying perception, reasoning, action, and memory through end-to-end learning, open problems remain in data scalability, multi-turn reinforcement learning (RL), the limitations of GUI-only operation, and environment stability. In this technical report, we present UI-TARS-2, a native GUI-centered agent model that addresses these challenges through a systematic training methodology: a data flywheel for scalable data generation, a stabilized multi-turn RL framework, a hybrid GUI environment that integrates file systems and terminals, and a unified sandbox platform for large-scale rollouts. Empirical evaluation demonstrates that UI-TARS-2 achieves significant improvements over its predecessor UI-TARS-1.5. On GUI benchmarks, it reaches 88.2 on Online-Mind2Web, 47.5 on OSWorld, 50.6 on WindowsAgentArena, and 73.3 on AndroidWorld, outperforming strong baselines such as Claude and OpenAI agents. In game environments, it attains a mean normalized score of 59.8 across a 15-game suite-roughly 60% of human-level performance-and remains competitive with frontier proprietary models (e.g., OpenAI o3) on LMGame-Bench. Additionally, the model can generalize to long-horizon information-seeking tasks and software engineering benchmarks, highlighting its robustness across diverse agent tasks. Detailed analyses of training dynamics further provide insights into achieving stability and efficiency in large-scale agent RL. These results underscore UI-TARS-2's potential to advance the state of GUI agents and exhibit strong generalization to real-world interactive scenarios.
中文: UI-TARS-2 是一种原生图形用户界面代理模型,通过系统性训练方法解决了数据可扩展性、多轮强化学习和环境稳定性等关键挑战,在图形界面基准测试中表现优异,并在游戏与长周期任务中展现出强大泛化能力。
English: UI-TARS-2 is a native GUI agent model that addresses key challenges in data scalability, multi-turn reinforcement learning, and environment stability, achieving superior performance on GUI benchmarks and competitive results in gaming and long-horizon tasks.
Authors:Qinglong Yang, Haoming Li, Haotian Zhao, Xiaokai Yan, Jingtao Ding, Fengli Xu, Yong Li
Abstract:
Mobile GUI agents are becoming critical tools for enhancing human-device interaction efficiency, with multimodal large language models (MLLMs) emerging as dominant paradigms in this domain. Current agents, however, are limited to following explicit human instructions, resulting in insufficient capability for proactive intent anticipation. Additionally, these agents fail to leverage the contextual information associated with users during task execution, thereby neglecting potentially vast differences in user preferences. To address these challenges, we introduce the FingerTip benchmark. It contains two new tracks: proactive task suggestions by analyzing environment observation and users' previous intents, and personalized task execution by catering to users' action preferences. We collected unique human demonstrations of multi-step Android device interactions across a variety of everyday apps. These demonstrations are not isolated but are continuously acquired from the users' long-term usage in their real lives, and encompass essential user-related contextual information. Our experiments reveal challenges of the tasks we propose. The model fine-tuned with the data we collected effectively utilized user information and achieved good results, highlighting the potential of our approach in building more user-oriented mobile GUI agents. Our code is open-source at https://anonymous.4open.science/r/FingerTip-57B8 for reproducibility.
中文摘要:FingerTip基准通过引入基于环境观察和用户历史意图的主动任务建议及个性化执行功能,解决了当前移动GUI代理缺乏主动性和个性化的问题,利用真实用户交互数据显著提升了代理的实用性和用户导向性。
English Summary: The FingerTip benchmark is introduced to overcome limitations in current mobile GUI agents by enabling proactive intent anticipation and personalized task execution through user-specific contextual data, demonstrating improved performance in user-oriented interactions.
Authors:Hang Wu, Hongkai Chen, Yujun Cai, Chang Liu, Qingwen Ye, Ming-Hsuan Yang, Yiwei Wang
Abstract:
Grounding natural language queries in graphical user interfaces (GUIs) poses unique challenges due to the diversity of visual elements, spatial clutter, and the ambiguity of language. In this paper, we introduce DiMo-GUI, a training-free framework for GUI grounding that leverages two core strategies: dynamic visual grounding and modality-aware optimization. Instead of treating the GUI as a monolithic image, our method splits the input into textual elements and iconic elements, allowing the model to reason over each modality independently using general-purpose vision-language models. When predictions are ambiguous or incorrect, DiMo-GUI dynamically focuses attention by generating candidate focal regions centered on the model's initial predictions and incrementally zooms into subregions to refine the grounding result. This hierarchical refinement process helps disambiguate visually crowded layouts without the need for additional training or annotations. We evaluate our approach on standard GUI grounding benchmarks and demonstrate consistent improvements over baseline inference pipelines, highlighting the effectiveness of combining modality separation with region-focused reasoning.
中文: DiMo-GUI是一种无需训练的图形界面定位框架,通过动态聚焦视觉区域并优化文本与图标模态处理,有效提升复杂界面中的定位精度且无需额外训练。
English: DiMo-GUI is a training-free framework that enhances GUI grounding by dynamically focusing on visual regions and optimizing across text and icon modalities, improving accuracy in cluttered interfaces without additional training.
Authors:Bingsheng Yao, Menglin Zhao, Zhan Zhang, Pengqi Wang, Emma G Chester, Changchang Yin, Tianshi Li, Varun Mishra, Lace Padilla, Odysseas Chatzipanagiotou, Timothy Pawlik, Ping Zhang, Weidan Cao, Dakuo Wang
Abstract:
Post-surgery care involves ongoing collaboration between provider teams and patients, which starts from post-surgery hospitalization through home recovery after discharge. While prior HCI research has primarily examined patients' challenges at home, less is known about how provider teams coordinate discharge preparation and care handoffs, and how breakdowns in communication and care pathways may affect patient recovery. To investigate this gap, we conducted semi-structured interviews with 13 healthcare providers and 4 patients in the context of gastrointestinal (GI) surgery. We found coordination boundaries between in- and out-patient teams, coupled with complex organizational structures within teams, impeded the "invisible work" of preparing patients' home care plans and triaging patient information. For patients, these breakdowns resulted in inadequate preparation for home transition and fragmented self-collected data, both of which undermine timely clinical decision-making. Based on these findings, we outline design opportunities to formalize task ownership and handoffs, contextualize co-temporal signals, and align care plans with home resources.
中文: 本研究发现,住院与门诊团队间的协调鸿沟及复杂的组织结构阻碍了出院准备和患者数据管理,导致家庭过渡准备不足和自我监测数据碎片化,从而影响临床决策的及时性。
English: This study reveals that coordination gaps between inpatient and outpatient teams, along with complex organizational structures, hinder effective discharge preparation and patient data management, leading to inadequate home transition and fragmented self-monitoring that impede timely clinical decisions.
Authors:Lu Sun, Shihan Fu, Bingsheng Yao, Yuxuan Lu, Wenbo Li, Hansu Gu, Jiri Gesi, Jing Huang, Chen Luo, Dakuo Wang
Abstract:
Agentic AI is emerging, capable of executing tasks through natural language, such as Copilot for coding or Amazon Rufus for shopping. Evaluating these systems is challenging, as their rapid evolution outpaces traditional human evaluation. Researchers have proposed LLM Agents to simulate participants as digital twins, but it remains unclear to what extent a digital twin can represent a specific customer in multi-turn interaction with an agentic AI system. In this paper, we recruited 40 human participants to shop with Amazon Rufus, collected their personas, interaction traces, and UX feedback, and then created digital twins to repeat the task. Pairwise comparison of human and digital-twin traces shows that while agents often explored more diverse choices, their action patterns aligned with humans and yielded similar design feedback. This study is the first to quantify how closely LLM agents can mirror human multi-turn interaction with an agentic AI system, highlighting their potential for scalable evaluation.
中文: 本研究对比了人类与数字孪生与亚马逊Rufus的交互,发现尽管LLM智能体探索了更多选项,但其行为模式与人类高度一致且反馈相似,证明了它们在代理AI系统可扩展评估方面的潜力。
English: This study compares human and digital twin interactions with Amazon Rufus, finding that while LLM agents explore more options, they align closely with human action patterns and feedback, demonstrating their potential for scalable evaluation of agentic AI systems.
Authors:Bingsheng Yao, Jiaju Chen, Chaoran Chen, April Wang, Toby Jia-jun Li, Dakuo Wang
Abstract:
Intelligent systems have traditionally been designed as tools rather than collaborators, often lacking critical characteristics that collaboration partnerships require. Recent advances in large language model (LLM) agents open new opportunities for human-LLM-agent collaboration by enabling natural communication and various social and cognitive behaviors. Yet it remains unclear whether principles of computer-mediated collaboration established in HCI and CSCW persist, change, or fail when humans collaborate with LLM agents. To support systematic investigations of these questions, we introduce an open and configurable research platform for HCI researchers. The platform's modular design allows seamless adaptation of classic CSCW experiments and manipulation of theory-grounded interaction controls. We demonstrate the platform's effectiveness and usability through two case studies: (1) re-implementing the classic human-human-collaboration task Shape Factory as a between-subject human-agent-collaboration experiment with 16 participants, and (2) a participatory cognitive walkthrough with five HCI researchers to refine workflows and interfaces for experiment setup and analysis.
中文: 传统智能系统被设计为工具而非协作伙伴,但大型语言模型(LLM)智能体的发展为人类与智能体协作开辟了新途径,不过现有计算机中介协作原则在人类与LLM智能体协作中的适用性尚不明确;为此我们开发了一个可配置的HCI研究平台,并通过人机协作实验和参与式评估案例验证了其有效性。
English: Traditional intelligent systems have been designed as tools rather than collaborative partners, but recent advances in LLM agents offer new opportunities for human-agent collaboration, though it remains unclear how established computer-mediated collaboration principles apply; to investigate this, we introduce a configurable research platform for HCI researchers, demonstrated through case studies including a human-agent collaboration experiment and participatory evaluations.
Authors:Jingyu Tang, Chaoran Chen, Jiawen Li, Zhiping Zhang, Bingcan Guo, Ibrahim Khalilov, Simret Araya Gebreegziabher, Bingsheng Yao, Dakuo Wang, Yanfang Ye, Tianshi Li, Ziang Xiao, Yaxing Yao, Toby Jia-Jun Li
Abstract:
The dark patterns, deceptive interface designs manipulating user behaviors, have been extensively studied for their effects on human decision-making and autonomy. Yet, with the rising prominence of LLM-powered GUI agents that automate tasks from high-level intents, understanding how dark patterns affect agents is increasingly important. We present a two-phase empirical study examining how agents, human participants, and human-AI teams respond to 16 types of dark patterns across diverse scenarios. Phase 1 highlights that agents often fail to recognize dark patterns, and even when aware, prioritize task completion over protective action. Phase 2 revealed divergent failure modes: humans succumb due to cognitive shortcuts and habitual compliance, while agents falter from procedural blind spots. Human oversight improved avoidance but introduced costs such as attentional tunneling and cognitive load. Our findings show neither humans nor agents are uniformly resilient, and collaboration introduces new vulnerabilities, suggesting design needs for transparency, adjustable autonomy, and oversight.
中文摘要:暗黑模式对人类和AI代理构成不同风险,人类因认知偏见而受骗,AI代理因程序盲点而忽视欺骗性设计,人机协作反而引入新漏洞,需通过透明设计和可调节监管来应对。
English Summary: Dark patterns pose distinct risks to both humans and AI agents, with humans falling prey to cognitive biases while agents overlook deceptive designs due to procedural gaps, and human-AI collaboration introduces new vulnerabilities requiring transparent and adjustable oversight.
Authors:Xiao Li, Yanfan Zhu, Ruining Deng, Wei-Qi Wei, Yu Wang, Shilin Zhao, Yaohong Wang, Haichun Yang, Yuankai Huo
Abstract:
Recent advances in medical vision-language models (VLMs) open up remarkable opportunities for clinical applications such as automated report generation, copilots for physicians, and uncertainty quantification. However, despite their promise, medical VLMs introduce serious security concerns, most notably risks of Protected Health Information (PHI) exposure, data leakage, and vulnerability to cyberthreats - which are especially critical in hospital environments. Even when adopted for research or non-clinical purposes, healthcare organizations must exercise caution and implement safeguards. To address these challenges, we present MedFoundationHub, a graphical user interface (GUI) toolkit that: (1) enables physicians to manually select and use different models without programming expertise, (2) supports engineers in efficiently deploying medical VLMs in a plug-and-play fashion, with seamless integration of Hugging Face open-source models, and (3) ensures privacy-preserving inference through Docker-orchestrated, operating system agnostic deployment. MedFoundationHub requires only an offline local workstation equipped with a single NVIDIA A6000 GPU, making it both secure and accessible within the typical resources of academic research labs. To evaluate current capabilities, we engaged board-certified pathologists to deploy and assess five state-of-the-art VLMs (Google-MedGemma3-4B, Qwen2-VL-7B-Instruct, Qwen2.5-VL-7B-Instruct, and LLaVA-1.5-7B/13B). Expert evaluation covered colon cases and renal cases, yielding 1015 clinician-model scoring events. These assessments revealed recurring limitations, including off-target answers, vague reasoning, and inconsistent pathology terminology.
中文:医学视觉语言模型虽具临床应用潜力,却存在严重安全隐患,为此研发的MedFoundationHub工具包通过图形界面实现安全便捷的模型部署与评估,并发现现有模型存在术语不一致等缺陷。
English: Medical vision-language models offer promising clinical applications but raise critical security risks, leading to the development of MedFoundationHub, a GUI toolkit that enables secure, user-friendly deployment and evaluation of these models while identifying current limitations like inconsistent terminology.
Authors:Fei Tang, Zhangxuan Gu, Zhengxi Lu, Xuyang Liu, Shuheng Shen, Changhua Meng, Wen Wang, Wenqi Zhang, Yongliang Shen, Weiming Lu, Jun Xiao, Yueting Zhuang
Abstract:
Graphical User Interface (GUI) grounding maps natural language instructions to precise interface locations for autonomous interaction. Current reinforcement learning approaches use binary rewards that treat elements as hit-or-miss targets, creating sparse signals that ignore the continuous nature of spatial interactions. Motivated by human clicking behavior that naturally forms Gaussian distributions centered on target elements, we introduce GUI Gaussian Grounding Rewards (GUI-G$^2$), a principled reward framework that models GUI elements as continuous Gaussian distributions across the interface plane. GUI-G$^2$ incorporates two synergistic mechanisms: Gaussian point rewards model precise localization through exponentially decaying distributions centered on element centroids, while coverage rewards assess spatial alignment by measuring the overlap between predicted Gaussian distributions and target regions. To handle diverse element scales, we develop an adaptive variance mechanism that calibrates reward distributions based on element dimensions. This framework transforms GUI grounding from sparse binary classification to dense continuous optimization, where Gaussian distributions generate rich gradient signals that guide models toward optimal interaction positions. Extensive experiments across ScreenSpot, ScreenSpot-v2, and ScreenSpot-Pro benchmarks demonstrate that GUI-G$^2$, substantially outperforms state-of-the-art method UI-TARS-72B, with the most significant improvement of 24.7% on ScreenSpot-Pro. Our analysis reveals that continuous modeling provides superior robustness to interface variations and enhanced generalization to unseen layouts, establishing a new paradigm for spatial reasoning in GUI interaction tasks.
中文摘要:本研究提出GUI-G²奖励框架,将图形界面元素建模为高斯分布以实现密集连续优化,相比现有方法性能提升最高达24.7%,并显著增强了对界面变化的鲁棒性。
English Summary: The study introduces GUI-G², a reward framework that models GUI elements as Gaussian distributions to provide dense continuous optimization signals, significantly outperforming existing methods by up to 24.7% and improving robustness to interface variations.
Authors:Marta Robledo-Moreno, Ruben Vera-Rodriguez, Ruben Tolosana, Javier Ortega-Garcia, Andres Huergo, Julian Fierrez
Abstract:
Behavioral biometrics based on smartphone motion sensors are growing in popularity for authentication purposes. In this study, AirSignatureDB is presented: a new publicly accessible dataset of in-air signatures collected from 108 participants under real-world conditions, using 83 different smartphone models across four sessions. This dataset includes genuine samples and skilled forgeries, enabling a comprehensive evaluation of system robustness against realistic attack scenarios. Traditional and deep learning-based methods for in-air signature verification are benchmarked, while analyzing the influence of sensor modality and enrollment strategies. Beyond verification, a first approach to reconstructing the three-dimensional trajectory of in-air signatures from inertial sensor data alone is introduced. Using on-line handwritten signatures as a reference, we demonstrate that the recovery of accurate trajectories is feasible, challenging the long-held assumption that in-air gestures are inherently traceless. Although this approach enables forensic traceability, it also raises critical questions about the privacy boundaries of behavioral biometrics. Our findings underscore the need for a reevaluation of the privacy assumptions surrounding inertial sensor data, as they can reveal user-specific information that had not previously been considered in the design of in-air signature systems.
中文: 本研究推出了AirSignatureDB这一公开的在空中签名数据集,通过108名参与者使用多种智能手机采集数据,不仅为验证方法提供了基准测试平台,还首次实现了仅从惯性传感器数据重建三维签名轨迹,打破了空中手势无痕的固有认知,并引发了对行为生物识别隐私边界的重要思考。
English: This study introduces AirSignatureDB, a public dataset of in-air signatures collected from 108 participants using various smartphones, which enables benchmarking verification methods and demonstrates the feasibility of reconstructing 3D signature trajectories from inertial sensor data, challenging assumptions about their traceless nature and raising privacy concerns.
Authors:Zhengliang Shi, Ruotian Ma, Jen-tse Huang, Xinbei Ma, Xingyu Chen, Mengru Wang, Qu Yang, Yue Wang, Fanghua Ye, Ziyang Chen, Shanyi Wang, Cixing Li, Wenxuan Wang, Zhaopeng Tu, Xiaolong Li, Zhaochun Ren, Linus
Abstract:
Large language models (LLMs) are increasingly entrusted with high-stakes decisions that affect human welfare. However, the principles and values that guide these models when distributing scarce societal resources remain largely unexamined. To address this, we introduce the Social Welfare Function (SWF) Benchmark, a dynamic simulation environment where an LLM acts as a sovereign allocator, distributing tasks to a heterogeneous community of recipients. The benchmark is designed to create a persistent trade-off between maximizing collective efficiency (measured by Return on Investment) and ensuring distributive fairness (measured by the Gini coefficient). We evaluate 20 state-of-the-art LLMs and present the first leaderboard for social welfare allocation. Our findings reveal three key insights: (i) A model's general conversational ability, as measured by popular leaderboards, is a poor predictor of its allocation skill. (ii) Most LLMs exhibit a strong default utilitarian orientation, prioritizing group productivity at the expense of severe inequality. (iii) Allocation strategies are highly vulnerable, easily perturbed by output-length constraints and social-influence framing. These results highlight the risks of deploying current LLMs as societal decision-makers and underscore the need for specialized benchmarks and targeted alignment for AI governance.
中文摘要:该研究引入社会福利函数基准评估大语言模型的资源分配能力,发现多数模型重效率轻公平且策略易受干扰,揭示了其作为社会决策工具存在的风险。
English Summary: The study introduces the Social Welfare Function Benchmark to evaluate how large language models allocate resources, revealing that most models prioritize efficiency over fairness and are highly sensitive to contextual changes, highlighting risks in their use for societal decisions.
Authors:Joanna Sorysz, Lars Krupp, Dominique Nshimyimana, Meagan B. Loerakker, Bo Zhou, Paul Lukowicz, Jakob Karolus
Abstract:
As wearable technologies continue to evolve-becoming smaller, more powerful, and more deeply embedded in daily life-their integration into diverse user contexts raises critical design challenges. There remains a notable gap in large-scale empirical data on where users actually wear or carry these devices throughout the day, systematically examining user preferences for wearable placement across varied contexts and routines. In this work, we conducted a questionnaire in several countries aimed at capturing real-world habits related to wearable device placement. The results from n = 320 participants reveal how wearable usage patterns shift depending on time of day and context. We propose a set of practical, user-centered guidelines for sensor placement and discuss how they align or diverge from assumptions seen in existing ISWC work. This study contributes to ongoing efforts within the community to design more inclusive, adaptable, and context-aware wearable systems.
中文: 本研究通过多国问卷调查填补了可穿戴设备实际佩戴位置的大规模数据空白,揭示了320名参与者的使用模式如何随时间和情境变化,并提出了以用户为中心的指导原则,以推动更具适应性的可穿戴系统设计。
English: This study addresses the lack of large-scale data on wearable device placement by surveying 320 participants across multiple countries, revealing how usage patterns vary with time and context, and proposes user-centered guidelines to inform more adaptable wearable system designs.
Authors:Lala Shakti Swarup Ray, Vitor Fortes Rey, Bo Zhou, Paul Lukowicz, Sungho Suh
Abstract:
Prolonged seated activity is increasingly common in modern environments, raising concerns around musculoskeletal health, ergonomics, and the design of responsive interactive systems. Existing posture sensing methods such as vision-based or wearable approaches face limitations including occlusion, privacy concerns, user discomfort, and restricted deployment flexibility. We introduce ChairPose, the first full body, wearable free seated pose estimation system that relies solely on pressure sensing and operates independently of chair geometry. ChairPose employs a two stage generative model trained on pressure maps captured from a thin, chair agnostic sensing mattress. Unlike prior approaches, our method explicitly incorporates chair morphology into the inference process, enabling accurate, occlusion free, and privacy preserving pose estimation. To support generalization across diverse users and chairs, we introduce a physics driven data augmentation pipeline that simulates realistic variations in posture and seating conditions. Evaluated across eight users and four distinct chairs, ChairPose achieves a mean per joint position error of 89.4 mm when both the user and the chair are unseen, demonstrating robust generalization to novel real world generalizability. ChairPose expands the design space for posture aware interactive systems, with potential applications in ergonomics, healthcare, and adaptive user interfaces.
Chinese: ChairPose首次推出仅通过压力传感实现的无穿戴式全身坐姿估计系统,能在不同用户和椅子上实现精确、保护隐私且无遮挡的姿势追踪,无需依赖特定椅子结构。
English: ChairPose introduces the first wearable-free, full-body seated pose estimation system using only pressure sensing, achieving accurate and privacy-preserving posture tracking across various chairs and users without occlusion or chair-specific constraints.
Authors:Lala Shakti Swarup Ray, Bo Zhou, Paul Lukowicz
Abstract:
Inertial measurement units (IMUs) are central to wearable systems for activity recognition and pose estimation, but sensor placement remains largely guided by heuristics and convention. In this work, we introduce Where to Wear (W2W), a simulation-based framework for systematic exploration of IMU placement utility across the body. Using labeled motion capture data, W2W generates realistic synthetic IMU signals at 512 anatomically distributed surface patches, enabling high-resolution, task-specific evaluation of sensor performance. We validate reliability of W2W by comparing spatial performance rankings from synthetic data with real IMU recordings in two multimodal datasets, confirming strong agreement in activity-wise trends. Further analysis reveals consistent spatial trends across activity types and uncovers overlooked high-utility regions that are rarely used in commercial systems. These findings challenge long-standing placement norms and highlight opportunities for more efficient, task-adaptive sensor configurations. Overall, our results demonstrate that simulation with W2W can serve as a powerful design tool for optimizing sensor placement, enabling scalable, data-driven strategies that are impractical to obtain through physical experimentation alone.
Chinese: W2W仿真框架通过运动捕捉数据生成合成信号,系统评估了IMU传感器在身体各部位的最佳放置位置,揭示了传统方案忽略的高效区域,为可穿戴设备设计提供了数据驱动的优化方法。
English: The W2W simulation framework systematically evaluates optimal IMU sensor placements by generating synthetic data from motion capture, revealing overlooked high-utility body regions and challenging conventional placement norms for more efficient wearable designs.
Authors:Mengxi Liu, Daniel GeiÃler, Deepika Gurung, Hymalai Bello, Bo Zhou, Sizhen Bian, Paul Lukowicz, Passant Elagroudy
Abstract:
Breathing is a spontaneous but controllable body function that can be used for hands-free interaction. Our work introduces "iBreath", a novel system to detect breathing gestures similar to clicks using bio-impedance. We evaluated iBreath's accuracy and user experience using two lab studies (n=34). Our results show high detection accuracy (F1-scores > 95.2%). Furthermore, the users found the gestures easy to use and comfortable. Thus, we developed eight practical guidelines for the future development of breathing gestures. For example, designers can train users on new gestures within just 50 seconds (five trials), and achieve robust performance with both user-dependent and user-independent models trained on data from 21 participants, each yielding accuracies above 90%. Users preferred single clicks and disliked triple clicks. The median gesture duration is 3.5-5.3 seconds. Our work provides solid ground for researchers to experiment with creating breathing gestures and interactions.
中文: iBreath是一种基于生物阻抗的系统,能以超过95.2%的F1分数高精度检测呼吸手势,用户反馈舒适易用,研究提出了八项开发指南(如50秒快速训练),为呼吸交互研究奠定了坚实基础。
English: iBreath is a bio-impedance-based system that detects breathing gestures with high accuracy (over 95.2% F1-score) and user comfort, establishing eight practical guidelines for future development, such as rapid training and robust performance across user models.
Authors:Laura Ribeiro, Muhammad Shaheer, Miguel Fernandez-Cortizas, Ali Tourani, Holger Voos, Jose Luis Sanchez-Lopez
Abstract:
Semantic SLAM (Simultaneous Localization and Mapping) systems enrich robot maps with structural and semantic information, enabling robots to operate more effectively in complex environments. However, these systems struggle in real-world scenarios with occlusions, incomplete data, or ambiguous geometries, as they cannot fully leverage the higher-level spatial and semantic knowledge humans naturally apply. We introduce HICS-SLAM, a Human-in-the-Loop semantic SLAM framework that uses a shared extended reality environment for real-time collaboration. The system allows human operators to directly interact with and visualize the robot's 3D scene graph, and add high-level semantic concepts (e.g., rooms or structural entities) into the mapping process. We propose a graph-based semantic fusion methodology that integrates these human interventions with robot perception, enabling scalable collaboration for enhanced situational awareness. Experimental evaluations on real-world construction site datasets demonstrate improvements in room detection accuracy, map precision, and semantic completeness compared to automated baselines, demonstrating both the effectiveness of the approach and its potential for future extensions.
Chinese Summary: HICS-SLAM是一种人在回路的语义SLAM框架,通过扩展现实实现人机实时协作,在复杂环境中显著提升了机器人的建图精度和语义完整性。
English Summary: HICS-SLAM is a human-in-the-loop semantic SLAM framework that integrates real-time human interventions through extended reality to enhance robot mapping accuracy and semantic completeness in challenging environments.
Authors:Bohan Jiang, Dawei Li, Zhen Tan, Chengshuai Zhao, Huan Liu
Abstract:
Well-being encompasses mental, physical, and social dimensions essential to personal growth and informed life decisions. As individuals increasingly consult Large Language Models (LLMs) to understand well-being, a key challenge emerges: Can LLMs generate explanations that are not only accurate but also tailored to diverse audiences? High-quality explanations require both factual correctness and the ability to meet the expectations of users with varying expertise. In this work, we construct a large-scale dataset comprising 43,880 explanations of 2,194 well-being concepts, generated by ten diverse LLMs. We introduce a principle-guided LLM-as-a-judge evaluation framework, employing dual judges to assess explanation quality. Furthermore, we show that fine-tuning an open-source LLM using Supervised Fine-Tuning (SFT) and Direct Preference Optimization (DPO) can significantly enhance the quality of generated explanations. Our results reveal: (1) The proposed LLM judges align well with human evaluations; (2) explanation quality varies significantly across models, audiences, and categories; and (3) DPO- and SFT-finetuned models outperform their larger counterparts, demonstrating the effectiveness of preference-based learning for specialized explanation tasks.
Chinese: 本研究通过构建大规模数据集和原则引导的评估框架,评估大型语言模型生成针对性健康解释的能力,结果表明基于偏好的微调方法能显著提升针对不同受众的解释质量。
English: This study evaluates how well Large Language Models (LLMs) generate tailored well-being explanations by creating a large dataset and a principle-guided evaluation framework, revealing that fine-tuning with preference-based learning significantly improves explanation quality across diverse audiences.
Authors:Kaiqi Yang, Hang Li, Yucheng Chu, Ahreum Han, Yasemin Copur-Gencturk, Jiliang Tang, Hui Liu
Abstract:
Professional development (PD) serves as the cornerstone for teacher tutors to grasp content knowledge. However, providing equitable and timely PD opportunities for teachers poses significant challenges. To address this issue, we introduce I-VIP (Intelligent Virtual Interactive Program), an intelligent tutoring platform for teacher professional development, driven by large language models (LLMs) and supported by multi-agent frameworks. This platform offers a user-friendly conversational interface and allows users to employ a variety of interactive tools to facilitate question answering, knowledge comprehension, and reflective summarization while engaging in dialogue. To underpin the functionality of this platform, including knowledge expectation analysis, response scoring and classification, and feedback generation, the multi-agent frameworks are leveraged to enhance the accuracy of judgments and mitigate the issue of missing key points.
中文摘要:I-VIP是一个基于大语言模型和多智能体框架的智能辅导平台,通过对话界面和交互工具为教师提供公平的专业发展机会。
English Summary: I-VIP is an intelligent tutoring platform that uses large language models and multi-agent frameworks to provide equitable professional development for teachers through interactive tools and conversational interfaces.
Authors:Tiantian Feng, Brandon M Booth, Karel Mundnich, Emily Zhou, Benjamin Girault, Kristina Lerman, Shrikanth Narayanan
Abstract:
Sleep is important for everyday functioning, overall well-being, and quality of life. Recent advances in wearable sensing technology have enabled continuous, noninvasive, and cost-effective monitoring of sleep patterns in real-world natural living settings. Wrist-worn devices, in particular, are capable of tracking sleep patterns using accelerometers and heart rate sensors. To support sleep research in naturalistic environments using wearable sensors, we introduce the TILES-2018 Sleep Benchmark dataset, which we make publicly available to the research community. This dataset was collected over a 10-week period from 139 hospital employees and includes over 6,000 unique sleep recordings, alongside self-reported survey data from each participant, which includes sleep quality, stress, and anxiety among other measurements. We present in-depth analyses of sleep patterns by combining the TILES-2018 Sleep Benchmark dataset with a previously released dataset (TILES-2018), which follows a similar study protocol. Our analyses include sleep duration, sleep stages, and sleep diaries. Moreover, we report machine learning benchmarks using this dataset as a testbed for tasks including sleep stage classification, prediction of self-reported sleep quality, and classifying demographics. Overall, this dataset provides a valuable resource for advancing foundational studies in sleep behavior modeling.
中文:TILES-2018睡眠基准数据集通过可穿戴传感器支持真实环境下的睡眠研究,提供超过6000条睡眠记录和调查数据,以推动睡眠行为建模与机器学习应用的发展。
English: The TILES-2018 Sleep Benchmark dataset enables real-world sleep research through wearable sensors, offering over 6,000 sleep recordings and survey data to advance sleep behavior modeling and machine learning applications.
Authors:Jiayi Ye, Chaoran Chen, Yue Huang, Yanfang Ye, Toby Jia-Jun Li, Xiangliang Zhang
Abstract:
AI VTubers, where the performer is not human but algorithmically generated, introduce a new context for fandom. While human VTubers have been substantially studied for their cultural appeal, parasocial dynamics, and community economies, little is known about how audiences engage with their AI counterparts. To address this gap, we present a qualitative study of Neuro-sama, the most prominent AI VTuber. Our findings show that engagement is anchored in active co-creation: audiences are drawn by the AI's unpredictable yet entertaining interactions, cement loyalty through collective emotional events that trigger anthropomorphic projection, and sustain attachment via the AI's consistent persona. Financial support emerges not as a reward for performance but as a participatory mechanism for shaping livestream content, establishing a resilient fan economy built on ongoing interaction. These dynamics reveal how AI Vtuber fandom reshapes fan-creator relationships and offer implications for designing transparent and sustainable AI-mediated communities.
中文: AI虚拟主播通过不可预测的互动、情感拟人化投射和参与式经济支持重塑粉丝关系,其粉丝经济建立在持续互动基础上,为构建可持续的AI社群提供新范式。
English: AI VTubers foster audience engagement through unpredictable interactions, emotional anthropomorphic projection, and participatory financial support, reshaping fan-creator relationships and community economies.
Authors:Maciej Besta, Shriram Chandran, Robert Gerstenberger, Mathis Lindner, Marcin Chrapek, Sebastian Hermann Martschat, Taraneh Ghandi, Patrick Iff, Hubert Niewiadomski, Piotr Nyczyk, Jürgen Müller, Torsten Hoefler
Abstract:
We introduce MBTI-in-Thoughts, a framework for enhancing the effectiveness of Large Language Model (LLM) agents through psychologically grounded personality conditioning. Drawing on the Myers-Briggs Type Indicator (MBTI), our method primes agents with distinct personality archetypes via prompt engineering, enabling control over behavior along two foundational axes of human psychology, cognition and affect. We show that such personality priming yields consistent, interpretable behavioral biases across diverse tasks: emotionally expressive agents excel in narrative generation, while analytically primed agents adopt more stable strategies in game-theoretic settings. Our framework supports experimenting with structured multi-agent communication protocols and reveals that self-reflection prior to interaction improves cooperation and reasoning quality. To ensure trait persistence, we integrate the official 16Personalities test for automated verification. While our focus is on MBTI, we show that our approach generalizes seamlessly to other psychological frameworks such as Big Five, HEXACO, or Enneagram. By bridging psychological theory and LLM behavior design, we establish a foundation for psychologically enhanced AI agents without any fine-tuning.
中文: MBTI-in-Thoughts框架通过基于MBTI的人格提示工程增强大语言模型代理,使其在叙事生成和策略推理等任务中表现出稳定且可解释的行为模式,无需微调即可实现心理理论驱动的AI行为设计。
English: The MBTI-in-Thoughts framework enhances LLM agents by integrating MBTI personality conditioning through prompt engineering, enabling consistent behavioral biases and improved performance in tasks like narrative generation and strategic reasoning without fine-tuning.
Authors:Di Wen, Junwei Zheng, Ruiping Liu, Yi Xu, Kunyu Peng, Rainer Stiefelhagen
Abstract:
Industrial assembly tasks increasingly demand rapid adaptation to complex procedures and varied components, yet are often conducted in environments with limited computing, connectivity, and strict privacy requirements. These constraints make conventional cloud-based or fully autonomous solutions impractical for factory deployment. This paper introduces a mobile-device-based assistant system for industrial training and operational support, enabling real-time, semi-hands-free interaction through on-device perception and voice interfaces. The system integrates lightweight object detection, speech recognition, and Retrieval-Augmented Generation (RAG) into a modular on-device pipeline that operates entirely on-device, enabling intuitive support for part handling and procedure understanding without relying on manual supervision or cloud services. To enable scalable training, we adopt an automated data construction pipeline and introduce a two-stage refinement strategy to improve visual robustness under domain shift. Experiments on our generated dataset, i.e., Gear8, demonstrate improved robustness to domain shift and common visual corruptions. A structured user study further confirms its practical viability, with positive user feedback on the clarity of the guidance and the quality of the interaction. These results indicate that our framework offers a deployable solution for real-time, privacy-preserving smart assistance in industrial environments. We will release the Gear8 dataset and source code upon acceptance.
中文: 本文提出了一种基于移动设备的工业培训与操作辅助系统,通过集成轻量级感知与语音交互实现无需云端依赖的实时隐私保护指导,实验验证了其在领域适应性及用户交互质量方面的显著优势。
English: This paper presents a mobile-based assistant system for industrial training and support, utilizing on-device perception and voice interfaces to provide real-time, privacy-preserving guidance without cloud dependency, validated by improved robustness and positive user feedback.
Authors:Runcong Zhao, Qinglin Zhu, Hainiu Xu, Bin Liang, Lin Gui, Yulan He
Abstract:
Understanding character relationships is essential for interpreting complex narratives and conducting socially grounded AI research. However, manual annotation is time-consuming and low in coverage, while large language models (LLMs) often produce hallucinated or logically inconsistent outputs. We present SymbolicThought, a human-in-the-loop framework that combines LLM-based extraction with symbolic reasoning. The system constructs editable character relationship graphs, refines them using seven types of logical constraints, and enables real-time validation and conflict resolution through an interactive interface. To support logical supervision and explainable social analysis, we release a dataset of 160 interpersonal relationships with corresponding logical structures. Experiments show that SymbolicThought improves annotation accuracy and consistency while significantly reducing time cost, offering a practical tool for narrative understanding, explainable AI, and LLM evaluation.
Chinese: SymbolicThought是一种人机协同框架,结合大语言模型提取与符号推理,构建可编辑的角色关系图,通过逻辑约束和交互界面显著提升叙事理解与可解释AI的标注准确性和效率。
English: SymbolicThought is a human-in-the-loop framework that integrates LLM-based extraction with symbolic reasoning to construct editable character relationship graphs, improving annotation accuracy and efficiency for narrative understanding and explainable AI.
Authors:Yunbo Lyu, Zhou Yang, Jieke Shi, Jianming Chang, Yue Liu, David Lo
Abstract:
This paper aims to explore fundamental questions in the era when AI coding assistants like GitHub Copilot are widely adopted: what do developers truly value and criticize in AI coding assistants, and what does this reveal about their needs and expectations in real-world software development? Unlike previous studies that conduct observational research in controlled and simulated environments, we analyze extensive, first-hand user reviews of AI coding assistants, which capture developers' authentic perspectives and experiences drawn directly from their actual day-to-day work contexts. We identify 1,085 AI coding assistants from the Visual Studio Code Marketplace. Although they only account for 1.64% of all extensions, we observe a surge in these assistants: over 90% of them are released within the past two years. We then manually analyze the user reviews sampled from 32 AI coding assistants that have sufficient installations and reviews to construct a comprehensive taxonomy of user concerns and feedback about these assistants. We manually annotate each review's attitude when mentioning certain aspects of coding assistants, yielding nuanced insights into user satisfaction and dissatisfaction regarding specific features, concerns, and overall tool performance. Built on top of the findings-including how users demand not just intelligent suggestions but also context-aware, customizable, and resource-efficient interactions-we propose five practical implications and suggestions to guide the enhancement of AI coding assistants that satisfy user needs.
中文摘要:本研究通过分析AI编程助手的真实用户评价,揭示了开发者不仅需要智能建议,更追求情境感知、可定制和资源高效的交互,并据此提出五项改进建议以满足用户需求。
English Summary: This study analyzes real user reviews of AI coding assistants to understand developers' needs and criticisms, revealing that users demand intelligent, context-aware, customizable, and efficient tools, and offers five practical suggestions for improvement.
Authors:Ramaneswaran Selvakumar, Ashish Seth, Nishit Anand, Utkarsh Tyagi, Sonal Kumar, Sreyan Ghosh, Dinesh Manocha
Abstract:
The rapid progress of Large Language Models (LLMs) has empowered omni models to act as voice assistants capable of understanding spoken dialogues. These models can process multimodal inputs beyond text, such as speech and visual data, enabling more context-aware interactions. However, current benchmarks fall short in comprehensively evaluating how well these models generate context-aware responses, particularly when it comes to implicitly understanding fine-grained speech characteristics, such as pitch, emotion, timbre, and volume or the environmental acoustic context such as background sounds. Additionally, they inadequately assess the ability of models to align paralinguistic cues with complementary visual signals to inform their responses. To address these gaps, we introduce MultiVox, the first omni voice assistant benchmark designed to evaluate the ability of voice assistants to integrate spoken and visual cues including paralinguistic speech features for truly multimodal understanding. Specifically, MultiVox includes 1000 human-annotated and recorded speech dialogues that encompass diverse paralinguistic features and a range of visual cues such as images and videos. Our evaluation on 9 state-of-the-art models reveals that, although humans excel at these tasks, current models consistently struggle to produce contextually grounded responses.
中文摘要:MultiVox基准测试旨在解决现有评估在衡量语音助手整合副语言特征和视觉线索以生成情境感知响应能力方面的不足,研究发现尽管人类擅长此类任务,现有模型仍难以实现真正的多模态理解。
English Summary: The MultiVox benchmark is introduced to address the limitations of current evaluations in assessing voice assistants' ability to integrate paralinguistic speech features and visual cues for context-aware responses, revealing that existing models struggle with these multimodal tasks despite human proficiency.
Authors:Shunyu Liu, Minghao Liu, Huichi Zhou, Zhenyu Cui, Yang Zhou, Yuhao Zhou, Wendong Fan, Ge Zhang, Jiajun Shi, Weihao Xuan, Jiaxing Huang, Shuang Luo, Fang Wu, Heli Qi, Qingcheng Zeng, Ziqi Ren, Jialiang Gao, Jindi Lv, Junjie Wang, Aosong Feng, Heng Zhou, Wangchunshu Zhou, Zhenfei Yin, Wenlong Zhang, Guohao Li, Wenhao Yu, Irene Li, Lei Ma, Lei Bai, Qunshu Lin, Mingli Song, Dacheng Tao
Abstract:
Recent studies have delved into constructing autonomous agents capable of performing complex Graphical User Interface (GUI)-based computer tasks, with the potential to revolutionize human-computer interaction. Despite encouraging results, existing efforts mainly focus on short-term interactions and rely on outcome-only verification, thereby limiting their scalability in real-world GUI applications that demand long-horizon task decomposition and execution. In this work, we introduce VeriGUI, a novel verifiable long-chain GUI dataset designed to facilitate the development and evaluation of generalist GUI agents operating in realistic computer environments. Our dataset emphasizes two critical dimensions: (1) long-chain complexity, with tasks decomposed into a sequence of interdependent subtasks spanning hundreds of steps, explicitly designed to allow any subtask to serve as a valid starting point; and (2) subtask-level verifiability, which enables diverse exploration strategies within each subtask, while ensuring that each subtask-level goal remains verifiable and consistent. The dataset consists of GUI task trajectories across both desktop and web, annotated by human experts. Extensive experiments on VeriGUI using various agents with different foundation models reveal significant performance gaps in handling long-horizon tasks, highlighting the need for more robust planning and decision-making capabilities in GUI agents.
Chinese: 近期研究推出VeriGUI,一个可验证的长链图形用户界面数据集,旨在通过处理长程任务复杂性和子任务级可验证性来开发和评估通用GUI代理,实验显示现有代理在性能上存在显著差距。
English: Recent research introduces VeriGUI, a verifiable long-chain GUI dataset designed to develop and evaluate generalist GUI agents by addressing long-horizon task complexity and subtask-level verifiability, revealing significant performance gaps in existing agents.
Authors:A. Ali Heydari, Ken Gu, Vidya Srinivas, Hong Yu, Zhihan Zhang, Yuwei Zhang, Akshay Paruchuri, Qian He, Hamid Palangi, Nova Hammerquist, Ahmed A. Metwally, Brent Winslow, Yubin Kim, Kumar Ayush, Yuzhe Yang, Girish Narayanswamy, Maxwell A. Xu, Jake Garrison, Amy Armento Lee, Jenny Vafeiadou, Ben Graef, Isaac R. Galatzer-Levy, Erik Schenck, Andrew Barakat, Javier Perez, Jacqueline Shreibati, John Hernandez, Anthony Z. Faranesh, Javier L. Prieto, Connor Heneghan, Yun Liu, Jiening Zhan, Mark Malhotra, Shwetak Patel, Tim Althoff, Xin Liu, Daniel McDuff, Xuhai "Orson" Xu
Abstract:
Health is a fundamental pillar of human wellness, and the rapid advancements in large language models (LLMs) have driven the development of a new generation of health agents. However, the application of health agents to fulfill the diverse needs of individuals in daily non-clinical settings is underexplored. In this work, we aim to build a comprehensive personal health agent that is able to reason about multimodal data from everyday consumer wellness devices and common personal health records, and provide personalized health recommendations. To understand end-users' needs when interacting with such an assistant, we conducted an in-depth analysis of web search and health forum queries, alongside qualitative insights from users and health experts gathered through a user-centered design process. Based on these findings, we identified three major categories of consumer health needs, each of which is supported by a specialist sub-agent: (1) a data science agent that analyzes personal time-series wearable and health record data, (2) a health domain expert agent that integrates users' health and contextual data to generate accurate, personalized insights, and (3) a health coach agent that synthesizes data insights, guiding users using a specified psychological strategy and tracking users' progress. Furthermore, we propose and develop the Personal Health Agent (PHA), a multi-agent framework that enables dynamic, personalized interactions to address individual health needs. To evaluate each sub-agent and the multi-agent system, we conducted automated and human evaluations across 10 benchmark tasks, involving more than 7,000 annotations and 1,100 hours of effort from health experts and end-users. Our work represents the most comprehensive evaluation of a health agent to date and establishes a strong foundation towards the futuristic vision of a personal health agent accessible to everyone.
中文摘要:大语言模型推动了健康助手的发展,但其在日常非临床场景中的应用仍待探索,本研究构建了一个综合个人健康助手,通过专业子模块整合多模态数据和用户需求,实现个性化健康管理。
English Summary: Large language models are advancing health agents, yet their daily non-clinical applications remain underexplored, prompting the development of a comprehensive personal health agent that integrates multimodal data and user insights through specialized sub-agents for personalized health management.
Authors:Yaqiong Li, Peng Zhang, Lin Wang, Hansu Gu, Siyuan Qiao, Ning Gu, Tun Lu
Abstract:
Risk perception is subjective, and youth's understanding of toxic content differs from that of adults. Although previous research has conducted extensive studies on toxicity detection in social media, the investigation of youth's unique toxicity, i.e., languages perceived as nontoxic by adults but toxic as youth, is ignored. To address this gap, we aim to explore: 1) What are the features of ``youth-toxicity'' languages in social media (RQ1); 2) Can existing toxicity detection techniques accurately detect these languages (RQ2). For these questions, we took Chinese youth as the research target, constructed the first Chinese ``youth-toxicity'' dataset, and then conducted extensive analysis. Our results suggest that youth's perception of these is associated with several contextual factors, like the source of an utterance and text-related features. Incorporating these meta information into current toxicity detection methods significantly improves accuracy overall. Finally, we propose several insights into future research on youth-centered toxicity detection.
中文摘要:青少年对社交媒体中有毒内容的感知与成人不同,本研究通过构建首个中文“青少年毒性”数据集,揭示了其语言特征并利用上下文信息显著提升了检测准确性。
English Summary: Youth perceive toxicity in social media differently from adults, and this study identifies unique "youth-toxic" language features and improves detection accuracy by incorporating contextual factors.
Authors:LuÃs F. Gomes, Xin Zhou, David Lo, Rui Abreu
Abstract:
Visual documentation is an effective tool for reducing the cognitive barrier developers face when understanding unfamiliar code, enabling more intuitive comprehension. Compared to textual documentation, it provides a higher-level understanding of the system structure and data flow. Developers usually prefer visual representations over lengthy textual descriptions for large software systems. Visual documentation is both difficult to produce and challenging to evaluate. Manually creating it is time-consuming, and currently, no existing approach can automatically generate high-level visual documentation directly from code. Its evaluation is often subjective, making it difficult to standardize and automate. To address these challenges, this paper presents the first exploration of using agentic LLM systems to automatically generate visual documentation. We introduce VisDocSketcher, the first agent-based approach that combines static analysis with LLM agents to identify key elements in the code and produce corresponding visual representations. We propose a novel evaluation framework, AutoSketchEval, for assessing the quality of generated visual documentation using code-level metrics. The experimental results show that our approach can valid visual documentation for 74.4% of the samples. It shows an improvement of 26.7-39.8% over a simple template-based baseline. Our evaluation framework can reliably distinguish high-quality (code-aligned) visual documentation from low-quality (non-aligned) ones, achieving an AUC exceeding 0.87. Our work lays the foundation for future research on automated visual documentation by introducing practical tools that not only generate valid visual representations but also reliably assess their quality.
中文: 本文提出了首个基于智能体与LLM的视觉文档自动生成系统VisDocSketcher,并开发了AutoSketchEval评估框架,能可靠地区分高质量与低质量的可视化文档,为自动化视觉文档研究奠定了基础。
English: This paper introduces VisDocSketcher, the first agent-based system using LLMs to automatically generate visual documentation from code, and proposes AutoSketchEval, a novel evaluation framework that effectively assesses documentation quality with high reliability.
Authors:Reuben A. Luera, Ryan Rossi, Franck Dernoncourt, Samyadeep Basu, Sungchul Kim, Subhojyoti Mukherjee, Puneet Mathur, Ruiyi Zhang, Jihyung Kil, Nedim Lipka, Seunghyun Yoon, Jiuxiang Gu, Zichao Wang, Cindy Xiong Bearfield, Branislav Kveton
Abstract:
In an ideal design pipeline, user interface (UI) design is intertwined with user research to validate decisions, yet studies are often resource-constrained during early exploration. Recent advances in multimodal large language models (MLLMs) offer a promising opportunity to act as early evaluators, helping designers narrow options before formal testing. Unlike prior work that emphasizes user behavior in narrow domains such as e-commerce with metrics like clicks or conversions, we focus on subjective user evaluations across varied interfaces. We investigate whether MLLMs can mimic human preferences when evaluating individual UIs and comparing them. Using data from a crowdsourcing platform, we benchmark GPT-4o, Claude, and Llama across 30 interfaces and examine alignment with human judgments on multiple UI factors. Our results show that MLLMs approximate human preferences on some dimensions but diverge on others, underscoring both their potential and limitations in supplementing early UX research.
Authors:Mackenzie Jorgensen, Kendall Brogle, Katherine M. Collins, Lujain Ibrahim, Arina Shah, Petra Ivanovic, Noah Broestl, Gabriel Piles, Paul Dongha, Hatim Abdulhussein, Adrian Weller, Jillian Powers, Umang Bhatt
Abstract:
Artificial intelligence (AI) is increasingly integrated into society, from financial services and traffic management to creative writing. Academic literature on the deployment of AI has mostly focused on the risks and harms that result from the use of AI. We introduce Fabric, a publicly available repository of deployed AI use cases to outline their governance mechanisms. Through semi-structured interviews with practitioners, we collect an initial set of 20 AI use cases. In addition, we co-design diagrams of the AI workflow with the practitioners. We discuss the oversight mechanisms and guardrails used in practice to safeguard AI use. The Fabric repository includes visual diagrams of AI use cases and descriptions of the deployed systems. Using the repository, we surface gaps in governance and find common patterns in human oversight of deployed AI systems. We intend for Fabric to serve as an extendable, evolving tool for researchers to study the effectiveness of AI governance.
中文: Fabric 作为一个可扩展的公开存储库,收录了20个已部署的AI用例及其治理机制,旨在帮助研究人员发现监管漏洞和常见的人工监督模式,以促进有效的人工智能治理。
English: The Fabric repository provides a collection of 20 deployed AI use cases with visual diagrams and governance details, aiming to help researchers identify governance gaps and patterns in human oversight for effective AI regulation.
Authors:Katherine M. Collins, Graham Todd, Cedegao E. Zhang, Adrian Weller, Julian Togelius, Junyi Chu, Lionel Wong, Thomas L. Griffiths, Joshua B. Tenenbaum
Abstract:
The human ability to learn rules and solve problems has been a central concern of cognitive science research since the field's earliest days. But we do not just follow rules and solve problems given to us by others: we modify those rules, create new problems, and set new goals and tasks for ourselves and others. Arguably, even more than rule following and problem solving, human intelligence is about creatively breaking and stretching the rules, changing the game, and inventing new problems worth thinking about. Creating a good rule or a good problem depends not just on the ideas one can think up but on how one evaluates such proposals. Here, we study invention through the lens of game design. We focus particularly on the early stages of novice, "everyday" game creation, where the stakes are low. We draw on a dataset of over 450 human created games, created by participants who saw an initial seed set of two-player grid-based strategy games. We consider two different cognitive mechanisms that may be at work during the early processes of intuitive game invention: an associative proposal based on previous games one has seen and compute-bounded model-based evaluation that an everyday game creator may use to refine their initial draft proposals. In our preliminary work, we conduct a model-based analysis of how people invented new games based on prior experience and find that generated games are best described by a model which incorporates model-based estimates of game quality at a population level. Our work points to how human invention is based not only on what people propose, but how they evaluate and offers a computational toolkit to scale empirical studies of model-based simulation in open-ended human innovation.
中文: 人类智能不仅体现在遵循规则和解决问题上,更在于创造性地打破和修改规则、发明新挑战并评估这些创新,一项基于新手游戏设计的研究通过认知机制和模型分析揭示了这一过程。
English: Human intelligence is characterized not just by following rules and solving problems but by creatively breaking and modifying them, inventing new challenges, and evaluating these innovations, as demonstrated through a study of novice game design using cognitive mechanisms and model-based analysis.
Authors:Katherine M. Collins, Kartik Chandra, Adrian Weller, Jonathan Ragan-Kelley, Joshua B. Tenenbaum
Abstract:
Why do we give the explanations we do? Recent work has suggested that we should think of explanation as a kind of cooperative social interaction, between a why-question-asker and an explainer. Here, we apply this perspective to consider the role that emotion plays in this social interaction. We develop a computational framework for modeling explainers who consider the emotional impact an explanation might have on a listener. We test our framework by using it to model human intuitions about how a doctor might explain to a patient why they have a disease, taking into account the patient's propensity for regret. Our model predicts human intuitions well, better than emotion-agnostic ablations, suggesting that people do indeed reason about emotion when giving explanations.
Chinese Summary: 该研究提出一个计算框架,模拟解释者如何考量解释对听者的情绪影响,并通过医患场景验证了人们在解释时会进行情绪推理,其表现优于忽略情绪的模型。
English Summary: The study proposes a computational framework that models how explainers consider the emotional impact on listeners, demonstrating through doctor-patient scenarios that people incorporate emotional reasoning into explanations, outperforming emotion-agnostic models.
Authors:Xiyuxing Zhang, Duc Vu, Chengyi Shen, Yuntao Wang, Yuanchun Shi, Justin Chan
Abstract:
There is growing industry interest in creating unobtrusive designs for electrooculography (EOG) sensing of eye gestures on glasses (e.g. JINS MEME and Apple eyewear). We present VergeIO, the first EOG-based glasses that enables depth-aware eye interaction using vergence with an optimized electrode layout and novel smart glass prototype. It can distinguish between four and six depth-based eye gestures with 83-98% accuracy using personalized models in a user study across 11 users and 1,320 gesture instances. It generalizes to unseen users with an accuracy of 80-98% without any calibration. To reduce false detections, we incorporate a motion artifact detection pipeline and a preamble-based activation scheme. The system uses dry sensors without any adhesives or gel, and operates in real time with 3 mW power consumption by the sensing front-end, making it suitable for always-on sensing.
中文摘要:VergeIO推出了首款基于眼电图的智能眼镜,通过优化电极布局实现深度感知的眼动交互,在无需用户校准的情况下能以83-98%的准确率识别深度手势,且仅需3毫瓦功耗即可实现实时持续感知。
English Summary: VergeIO introduces the first EOG-based smart glasses enabling depth-aware eye interaction through an optimized electrode design, achieving 83-98% accuracy for depth-based gestures with minimal power consumption and no user calibration required.
Authors:Weiyu Ma, Dongyu Xu, Shu Lin, Haifeng Zhang, Jun Wang
Abstract:
We present Adaptive Command, a novel framework integrating large language models (LLMs) with behavior trees for real-time strategic decision-making in StarCraft II. Our system focuses on enhancing human-AI collaboration in complex, dynamic environments through natural language interactions. The framework comprises: (1) an LLM-based strategic advisor, (2) a behavior tree for action execution, and (3) a natural language interface with speech capabilities. User studies demonstrate significant improvements in player decision-making and strategic adaptability, particularly benefiting novice players and those with disabilities. This work contributes to the field of real-time human-AI collaborative decision-making, offering insights applicable beyond RTS games to various complex decision-making scenarios.
中文摘要:Adaptive Command框架将大语言模型与行为树相结合,通过自然语言交互提升《星际争霸II》中的人机实时协作决策能力,尤其帮助新手和残障玩家改善战略适应性。
English Summary: Adaptive Command is a framework combining large language models and behavior trees to enhance real-time human-AI collaboration in StarCraft II through natural language, improving strategic decision-making especially for novices and players with disabilities.
Authors:Siting Wang, Minnan Pei, Luoyang Sun, Cheng Deng, Kun Shao, Zheng Tian, Haifeng Zhang, Jun Wang
Abstract:
Humans can directly imagine and manipulate visual images in their minds, a capability known as spatial visualization. While multi-modal Large Language Models (MLLMs) support imagination-based reasoning, spatial visualization remains insufficiently evaluated, typically embedded within broader mathematical and logical assessments. Existing evaluations often rely on IQ tests or math competitions that may overlap with training data, compromising assessment reliability. To this end, we introduce SpatialViz-Bench, a comprehensive multi-modal benchmark for spatial visualization with 12 tasks across 4 sub-abilities, comprising 1,180 automatically generated problems. Our evaluation of 33 state-of-the-art MLLMs not only reveals wide performance variations and demonstrates the benchmark's strong discriminative power, but also uncovers counter-intuitive findings: models show difficulty perception misaligned with human intuition, exhibit dramatic 2Dto-3D performance cliffs, default to formulaic derivation over visualization, and paradoxically suffer performance degradation from Chain-of-Thought prompting in open-source models. Through statistical and qualitative analysis of error types, SpatialViz-Bench demonstrates that state-of-the-art MLLMs continue to exhibit deficiencies in spatial visualization tasks, thereby addressing a significant lacuna in the field. The benchmark data and evaluation code are publicly available.
Chinese: 本文提出了SpatialViz-Bench这一评估多模态大模型空间可视化能力的综合基准,发现即使是最先进模型仍存在明显缺陷和反直觉行为,填补了该领域的重要空白。
English: This paper introduces SpatialViz-Bench, a comprehensive benchmark for evaluating spatial visualization in multi-modal LLMs, revealing significant performance gaps and counter-intuitive model behaviors despite current advancements.
Authors:Zheng Lian, Licai Sun, Haoyu Chen, Zebang Cheng, Fan Zhang, Ziyu Jia, Ziyang Ma, Fei Ma, Xiaojiang Peng, Jianhua Tao
Abstract:
With the recent success of large language models, Explainable Multimodal Emotion Recognition (EMER), also known as Descriptive MER (DMER), has attracted growing attention from researchers. Unlike traditional discriminative methods that rely on predefined emotion taxonomies, EMER aims to describe a person's emotional state using free-form natural language, thereby enabling fine-grained and interpretable emotion representations. However, this free-form prediction paradigm introduces significant challenges in evaluation. Existing approaches either depend on ground-truth descriptions, which require extensive manual annotations and often fail to capture the full complexity of human emotions, or simplify the evaluation task by shifting focus from assessing descriptions to evaluating emotion labels. However, this simplification overlooks critical aspects such as emotional temporal dynamics, intensity, and uncertainty. To address these limitations, we propose EMER-Ranker, a novel evaluation strategy that reformulates the traditional ``prediction-ground truth'' comparison into the ``prediction-prediction'' comparison, eliminating the need for ground-truth descriptions. We then apply the Bradley-Terry algorithm to convert pairwise comparison outcomes into model-level rankings. Additionally, we explore the potential for automatic preference prediction and introduce EMER-Preference, the first preference dataset specifically designed for human emotions. Our work advances the field of EMER and lays the foundation for more intelligent human-computer interaction systems.
中文摘要:本文提出EmoPrefer,首次探索利用多模态大语言模型实现高效的人类情感偏好标注,通过构建首个情感偏好数据集和评估基准,解决了描述性多模态情感识别中的评估难题。
English Summary: This paper introduces EmoPrefer, a pioneering study that explores using multimodal large language models to efficiently annotate human emotion preferences, addressing challenges in Descriptive Multimodal Emotion Recognition by creating the first emotion preference dataset and benchmark.
Authors:Yixin Liu, Yonghui Wu, Denghui Zhang, Lichao Sun
Abstract:
The exponential growth of scientific literature poses unprecedented challenges for researchers attempting to synthesize knowledge across rapidly evolving fields. We present \textbf{Agentic AutoSurvey}, a multi-agent framework for automated survey generation that addresses fundamental limitations in existing approaches. Our system employs four specialized agents (Paper Search Specialist, Topic Mining \& Clustering, Academic Survey Writer, and Quality Evaluator) working in concert to generate comprehensive literature surveys with superior synthesis quality. Through experiments on six representative LLM research topics from COLM 2024 categories, we demonstrate that our multi-agent approach achieves significant improvements over existing baselines, scoring 8.18/10 compared to AutoSurvey's 4.77/10. The multi-agent architecture processes 75--443 papers per topic (847 total across six topics) while targeting high citation coverage (often $\geq$80\% on 75--100-paper sets; lower on very large sets such as RLHF) through specialized agent orchestration. Our 12-dimension evaluation captures organization, synthesis integration, and critical analysis beyond basic metrics. These findings demonstrate that multi-agent architectures represent a meaningful advancement for automated literature survey generation in rapidly evolving scientific domains.
中文: Agentic AutoSurvey是一种多智能体框架,通过协同运作四个专业智能体,在快速发展的科学领域中实现了文献综述生成质量的显著提升,展现出卓越的综合能力与覆盖范围。
English: Agentic AutoSurvey is a multi-agent framework that significantly enhances automated literature survey generation by employing specialized agents to achieve superior synthesis quality and comprehensive coverage across rapidly evolving scientific fields.
Authors:Saki Hashimoto, Shoichi Hasegawa, Tomochika Ishikawa, Akira Taniguchi, Yoshinobu Hagiwara, Lotfi El Hafi, Tadahiro Taniguchi
Abstract:
Robots operating in domestic and office environments must understand object ownership to correctly execute instructions such as ``Bring me my cup.'' However, ownership cannot be reliably inferred from visual features alone. To address this gap, we propose Active Ownership Learning (ActOwL), a framework that enables robots to actively generate and ask ownership-related questions to users. ActOwL employs a probabilistic generative model to select questions that maximize information gain, thereby acquiring ownership knowledge efficiently to improve learning efficiency. Additionally, by leveraging commonsense knowledge from Large Language Models (LLM), objects are pre-classified as either shared or owned, and only owned objects are targeted for questioning. Through experiments in a simulated home environment and a real-world laboratory setting, ActOwL achieved significantly higher ownership clustering accuracy with fewer questions than baseline methods. These findings demonstrate the effectiveness of combining active inference with LLM-guided commonsense reasoning, advancing the capability of robots to acquire ownership knowledge for practical and socially appropriate task execution.
中文:ActOwL框架通过概率模型主动生成所有权问题,并利用大语言模型的常识推理预筛选对象,在模拟和真实环境中以更少提问显著提升了所有权识别准确率。
English: To help robots learn object ownership efficiently, the ActOwL framework actively generates targeted questions using a probabilistic model and LLM-based commonsense reasoning, achieving higher accuracy with fewer queries in both simulated and real-world tests.
Authors:Chen Sun, Wenqi Zhang, Bizhu Wang, Xiaodong Xu, Chau Yuen, Yan Zhang, Ping Zhang
Abstract:
In current vehicle-to-everything (V2X) communication systems, roadside units (RSUs) broadcast brief warning messages that alert nearby vehicles to avoid potential hazards. However, these messages lack contextual information on why a warning is issued, leading to excessive caution or inefficient driving behaviors. To avoid such a situation, we propose a semantic-enhanced and explainable V2X (SEE-V2X) system. In the proposed system, RSUs equipped with smart cameras detect obstructions and transmit context-aware messages to vehicles. By understanding both what the hazard is and why it occurs, drivers can make more intelligent decisions based on their specific driving situation. Furthermore, through a real-field demonstration, we show the new "see-through" feature in the proposed system, which enables drivers to visualize hidden pedestrians behind obstacles. We also perform simulations to compare traditional V2X with SEE-V2X under different traffic conditions. The results show that SEE-V2X significantly improves traffic efficiency and reduces unnecessary deceleration.
中文: 提出的SEE-V2X系统通过路边单元搭载的智能摄像头提供情境感知预警,使驾驶员能够可视化隐藏危险并做出明智决策,从而显著提升交通效率并减少不必要的减速行为。
English: The proposed SEE-V2X system enhances traditional V2X communication by providing context-aware warnings through smart cameras on RSUs, enabling drivers to visualize hidden hazards and make informed decisions, which significantly improves traffic efficiency and reduces unnecessary braking.
Authors:Edyta Bogucka, Marios Constantinides, Sanja Å ÄepanoviÄ, Daniele Quercia
Abstract:
Communicating the risks and benefits of AI is important for regulation and public understanding. Yet current methods such as technical reports often exclude people without technical expertise. Drawing on HCI research, we developed an Impact Assessment Card to present this information more clearly. We held three focus groups with a total of 12 participants who helped identify design requirements and create early versions of the card. We then tested a refined version in an online study with 235 participants, including AI developers, compliance experts, and members of the public selected to reflect the U.S. population by age, sex, and race. Participants used either the card or a full impact assessment report to write an email supporting or opposing a proposed AI system. The card led to faster task completion and higher-quality emails across all groups. We discuss how design choices can improve accessibility and support AI governance. Examples of cards are available at: https://social-dynamics.net/ai-risks/impact-card/.
Chinese: 研究人员开发了一种影响评估卡片,以比技术报告更易懂的方式呈现AI的风险与益处,测试显示该卡片显著提高了不同用户群体的任务效率和沟通质量。
English: Researchers developed an Impact Assessment Card to make AI risks and benefits more accessible than technical reports, which improved task efficiency and communication quality across diverse user groups in testing.
Authors:Tram Thi Minh Tran, Judy Kay, Stewart Worrall, Marius Hoggenmueller, Callum Parker, Xinyan Yu, Julie Stephany Berrio Perez, Mao Shan, Martin Tomitsch
Abstract:
External Human-Machine Interfaces (eHMIs) are key to facilitating interaction between autonomous vehicles and external road actors, yet most remain reactive and do not account for scalability and inclusivity. This paper introduces a conceptual design framework for adaptive eHMIs-interfaces that dynamically adjust communication as road actors vary and context shifts. Using the cyber-physical system as a structuring lens, the framework comprises three layers: Input (what the system detects), Processing (how the system decides), and Output (how the system communicates). Developed through theory-led abstraction and expert discussion, the framework helps researchers and designers think systematically about adaptive eHMIs and provides a structured tool to design, analyse, and assess adaptive communication strategies. We show how such systems may resolve longstanding limitations in eHMI research while raising new ethical and technical considerations.
中文: 本文提出了一种自适应外部人机接口的概念框架,通过动态调整通信来应对道路参与者和情境变化,解决了可扩展性和包容性不足的问题,同时提供了结构化设计层次并引发了新的伦理思考。
English: This paper proposes a conceptual framework for adaptive external Human-Machine Interfaces (eHMIs) that dynamically adjust communication based on road actors and context, addressing scalability and inclusivity limitations while introducing structured design layers and raising new ethical considerations.
Authors:Luyu Chen, Quanyu Dai, Zeyu Zhang, Xueyang Feng, Mingyu Zhang, Pengcheng Tang, Xu Chen, Yue Zhu, Zhenhua Dong
Abstract:
Conversational recommender systems (CRS) enhance user experience through multi-turn interactions, yet evaluating CRS remains challenging. User simulators can provide comprehensive evaluations through interactions with CRS, but building realistic and diverse simulators is difficult. While recent work leverages large language models (LLMs) to simulate user interactions, they still fall short in emulating individual real users across diverse scenarios and lack explicit rating mechanisms for quantitative evaluation. To address these gaps, we propose RecUserSim, an LLM agent-based user simulator with enhanced simulation realism and diversity while providing explicit scores. RecUserSim features several key modules: a profile module for defining realistic and diverse user personas, a memory module for tracking interaction history and discovering unknown preferences, and a core action module inspired by Bounded Rationality theory that enables nuanced decision-making while generating more fine-grained actions and personalized responses. To further enhance output control, a refinement module is designed to fine-tune final responses. Experiments demonstrate that RecUserSim generates diverse, controllable outputs and produces realistic, high-quality dialogues, even with smaller base LLMs. The ratings generated by RecUserSim show high consistency across different base LLMs, highlighting its effectiveness for CRS evaluation.
中文摘要:RecUserSim是一种基于大语言模型的先进用户模拟器,通过整合用户画像、记忆模块和有限理性决策机制,显著提升了对话推荐系统评估的真实性与多样性,同时能提供稳定可靠的显式评分。
English Summary: RecUserSim is an advanced LLM-based user simulator that enhances realism and diversity in conversational recommender system evaluations by incorporating user profiles, memory, and bounded rationality-driven decision-making, while providing consistent explicit ratings.
Authors:Tianyu Song, Feng Li, Yuan Bi, Angelos Karlas, Amir Yousefi, Daniela Branzan, Zhongliang Jiang, Ulrich Eck, Nassir Navab
Abstract:
The advancement and maturity of large language models (LLMs) and robotics have unlocked vast potential for human-computer interaction, particularly in the field of robotic ultrasound. While existing research primarily focuses on either patient-robot or physician-robot interaction, the role of an intelligent virtual sonographer (IVS) bridging physician-robot-patient communication remains underexplored. This work introduces a conversational virtual agent in Extended Reality (XR) that facilitates real-time interaction between physicians, a robotic ultrasound system(RUS), and patients. The IVS agent communicates with physicians in a professional manner while offering empathetic explanations and reassurance to patients. Furthermore, it actively controls the RUS by executing physician commands and transparently relays these actions to the patient. By integrating LLM-powered dialogue with speech-to-text, text-to-speech, and robotic control, our system enhances the efficiency, clarity, and accessibility of robotic ultrasound acquisition. This work constitutes a first step toward understanding how IVS can bridge communication gaps in physician-robot-patient interaction, providing more control and therefore trust into physician-robot interaction while improving patient experience and acceptance of robotic ultrasound.
中文: 本研究通过扩展现实中的智能虚拟超声医生,利用大语言模型驱动的对话和机器人控制实现医-机-患三方实时交互,在提升机器人超声检查效率的同时改善了患者体验。
English: This research introduces an intelligent virtual sonographer in Extended Reality that bridges physician-robot-patient communication by enabling real-time interaction through LLM-powered dialogue and robotic control, enhancing both procedural efficiency and patient experience in robotic ultrasound.
Authors:Bevan Koopman, Guido Zuccon
Abstract:
Despite widespread debunking, many psychological myths remain deeply entrenched. This paper investigates whether Large Language Models (LLMs) mimic human behaviour of myth belief and explores methods to mitigate such tendencies. Using 50 popular psychological myths, we evaluate myth belief across multiple LLMs under different prompting strategies, including retrieval-augmented generation and swaying prompts. Results show that LLMs exhibit significantly lower myth belief rates than humans, though user prompting can influence responses. RAG proves effective in reducing myth belief and reveals latent debiasing potential within LLMs. Our findings contribute to the emerging field of Machine Psychology and highlight how cognitive science methods can inform the evaluation and development of LLM-based systems.
中文摘要:研究发现大型语言模型对心理学谬误的相信程度低于人类,但用户提示会影响其回答,而检索增强生成能有效减少谬误相信并揭示其潜在的去偏见能力。
English Summary: This study finds that large language models exhibit lower belief in psychological myths than humans, but their responses can be influenced by user prompts, with retrieval-augmented generation effectively reducing myth belief and revealing latent debiasing capabilities.
Authors:Gongwei Chen, Xurui Zhou, Rui Shao, Yibo Lyu, Kaiwen Zhou, Shuai Wang, Wentao Li, Yinchuan Li, Zhongang Qi, Liqiang Nie
Abstract:
The research focus of GUI agents is shifting from text-dependent to pure-vision-based approaches, which, though promising, prioritize comprehensive pre-training data collection while neglecting contextual modeling challenges. We probe the characteristics of element and history contextual modeling in GUI agent and summarize: 1) the high-density and loose-relation of element context highlight the existence of many unrelated elements and their negative influence; 2) the high redundancy of history context reveals the inefficient history modeling in current GUI agents. In this work, we propose a context-aware simplification framework for building an efficient and effective GUI Agent, termed SimpAgent. To mitigate potential interference from numerous unrelated elements, we introduce a masking-based element pruning method that circumvents the intractable relation modeling through an efficient masking mechanism. To reduce the redundancy in historical information, we devise a consistency-guided history compression module, which enhances implicit LLM-based compression through innovative explicit guidance, achieving an optimal balance between performance and efficiency. With the above components, SimpAgent reduces 27% FLOPs and achieves superior GUI navigation performances. Comprehensive navigation experiments across diverse web and mobile environments demonstrate the effectiveness and potential of our agent.
中文: 本研究提出SimpAgent框架,通过屏蔽无关界面元素和压缩冗余历史记录来优化GUI代理的上下文建模,在减少27%计算量的同时显著提升了跨平台导航性能。
English: The study introduces SimpAgent, a context-aware framework that enhances GUI agent efficiency by pruning unrelated elements and compressing redundant history, reducing computational load by 27% while improving navigation performance.
Authors:Weizhi Chen, Ziwei Wang, Leyang Yang, Sheng Zhou, Xiaoxuan Tang, Jiajun Bu, Yong Li, Wei Jiang
Abstract:
Graphical User Interface (GUI) agents possess significant commercial and social value, and GUI agents powered by advanced multimodal large language models (MLLMs) have demonstrated remarkable potential. Currently, existing GUI agents usually utilize sequential episodes of multi-step operations across pages as the prior GUI knowledge, which fails to capture the complex transition relationship between pages, making it challenging for the agents to deeply perceive the GUI environment and generalize to new scenarios. Therefore, we design an automated pipeline to transform the sequential episodes into page graphs, which explicitly model the graph structure of the pages that are naturally connected by actions. To fully utilize the page graphs, we further introduce Retrieval-Augmented Generation (RAG) technology to effectively retrieve reliable perception guidelines of GUI from them, and a tailored multi-agent framework PG-Agent with task decomposition strategy is proposed to be injected with the guidelines so that it can generalize to unseen scenarios. Extensive experiments on various benchmarks demonstrate the effectiveness of PG-Agent, even with limited episodes for page graph construction.
中文:基于多模态大语言模型的图形用户界面代理潜力巨大,但现有方法依赖顺序操作难以捕捉页面间复杂转换关系,为此设计了将操作序列转化为页面图结构的方法,结合检索增强生成技术和多代理框架,实验证明该方案在有限数据下仍能有效泛化至新场景。
English: GUI agents leveraging multimodal large language models show great potential, but current approaches using sequential operations struggle with page transitions and generalization, prompting the development of a page graph-based framework enhanced with retrieval-augmented generation and a multi-agent system that demonstrates strong performance across benchmarks even with limited data.
Authors:Seoyoung Doh, Hyeon Jeon, Sungbok Shin, Ghulam Jilani Quadri, Nam Wook Kim, Jinwook Seo
Abstract:
Selecting the dimensionality reduction technique that faithfully represents the structure is essential for reliable visual communication and analytics. In reality, however, practitioners favor projections for other attractions, such as aesthetics and visual saliency, over the projection's structural faithfulness, a bias we define as visual interestingness. In this research, we conduct a user study that (1) verifies the existence of such bias and (2) explains why the bias exists. Our study suggests that visual interestingness biases practitioners' preferences when selecting projections for analysis, and this bias intensifies with color-encoded labels and shorter exposure time. Based on our findings, we discuss strategies to mitigate bias in perceiving and interpreting DR projections.
中文: 本研究证实了实践者存在视觉趣味性偏见,即倾向于选择美观而非结构保真的降维投影,这种偏见在彩色标签和短暂暴露下加剧,并提出了缓解策略。
English: This study confirms that practitioners exhibit a visual interestingness bias, favoring aesthetically appealing projections over structurally faithful ones, which intensifies with color labels and brief exposure, and proposes mitigation strategies.
Authors:Paulo Mendes, Eva Maia, Isabel Praça
Abstract:
Phishing emails continue to pose a significant threat to cybersecurity by exploiting human vulnerabilities through deceptive content and malicious payloads. While Machine Learning (ML) models are effective at detecting phishing threats, their performance largely relies on the quality and diversity of the training data. This paper presents MeAJOR (Merged email Assets from Joint Open-source Repositories) Corpus, a novel, multi-source phishing email dataset designed to overcome critical limitations in existing resources. It integrates 135894 samples representing a broad number of phishing tactics and legitimate emails, with a wide spectrum of engineered features. We evaluated the dataset's utility for phishing detection research through systematic experiments with four classification models (RF, XGB, MLP, and CNN) across multiple feature configurations. Results highlight the dataset's effectiveness, achieving 98.34% F1 with XGB. By integrating broad features from multiple categories, our dataset provides a reusable and consistent resource, while addressing common challenges like class imbalance, generalisability and reproducibility.
Chinese Summary: 本文提出MeAJOR语料库,这是一个整合多来源钓鱼邮件和合法邮件的新型数据集,通过综合特征工程有效提升检测性能,在XGBoost分类器中实现了98.34%的F1分数,解决了现有数据集的泛化性和可复现性等关键问题。
English Summary: This paper introduces the MeAJOR Corpus, a comprehensive multi-source phishing email dataset that enhances detection capabilities by integrating diverse samples and engineered features, achieving 98.34% F1 score with XGBoost in evaluations.
Authors:Hyeon Jeon, Jeongin Park, Soohyun Lee, Dae Hyun Kim, Sungbok Shin, Jinwook Seo
Abstract:
Selecting the appropriate dimensionality reduction (DR) technique and determining its optimal hyperparameter settings that maximize the accuracy of the output projections typically involves extensive trial and error, often resulting in unnecessary computational overhead. To address this challenge, we propose a dataset-adaptive approach to DR optimization guided by structural complexity metrics. These metrics quantify the intrinsic complexity of a dataset, predicting whether higher-dimensional spaces are necessary to represent it accurately. Since complex datasets are often inaccurately represented in two-dimensional projections, leveraging these metrics enables us to predict the maximum achievable accuracy of DR techniques for a given dataset, eliminating redundant trials in optimizing DR. We introduce the design and theoretical foundations of these structural complexity metrics. We quantitatively verify that our metrics effectively approximate the ground truth complexity of datasets and confirm their suitability for guiding dataset-adaptive DR workflow. Finally, we empirically show that our dataset-adaptive workflow significantly enhances the efficiency of DR optimization without compromising accuracy.
中文: 本研究提出了一种基于数据集结构复杂度度量的自适应降维优化方法,通过预测可达到的最高精度来避免冗余试验,在不影响性能的前提下显著提升了优化效率。
English: This study introduces a dataset-adaptive dimensionality reduction optimization method that uses structural complexity metrics to predict the maximum achievable accuracy, eliminating redundant trials and significantly improving efficiency without compromising performance.
Authors:Chayapatr Archiwaranguprok, Awu Chen, Sheer Karny, Hiroshi Ishii, Pattie Maes, Pat Pataranutaporn
Abstract:
Human-AI interaction researchers face an overwhelming challenge: synthesizing insights from thousands of empirical studies to understand how AI impacts people and inform effective design. Existing approach for literature reviews cluster papers by similarities, keywords or citations, missing the crucial cause-and-effect relationships that reveal how design decisions impact user outcomes. We introduce the Atlas of Human-AI Interaction, an interactive web interface that provides the first systematic mapping of empirical findings across 1,000+ HCI papers using LLM-powered knowledge extraction. Our approach identifies causal relationships, and visualizes them through an AI-enabled interactive web interface as a navigable knowledge graph. We extracted 2,037 empirical findings, revealing research topic clusters, common themes, and disconnected areas. Expert evaluation with 20 researchers revealed the system's effectiveness for discovering research gaps. This work demonstrates how AI can transform literature synthesis itself, offering a scalable framework for evidence-based design, opening new possibilities for computational meta-science across HCI and beyond.
Chinese: 《人机交互图谱》通过基于大语言模型的知识提取系统,从千余篇人机交互论文中梳理并可视化因果关系,构建可交互知识图谱,帮助研究者发现研究空白,为人机交互及其他领域的计算元科学研究提供了新范式。
English: The Atlas of Human-AI Interaction addresses the challenge of synthesizing vast empirical literature by using an LLM-powered system to extract and visualize causal relationships from over 1,000 HCI papers, enabling researchers to identify research gaps through an interactive knowledge graph.
Authors:Shuning Zhang, Hong Jia, Simin Li, Ting Dang, Yongquan `Owen' Hu, Xin Yi, Hewu Li
Abstract:
The reasoning capabilities of embodied agents introduce a critical, under-explored inferential privacy challenge, where the risk of an agent generate sensitive conclusions from ambient data. This capability creates a fundamental tension between an agent's utility and user privacy, rendering traditional static controls ineffective. To address this, this position paper proposes a framework that reframes privacy as a dynamic learning problem grounded in theory of Contextual Integrity (CI). Our approach enables agents to proactively learn and adapt to individual privacy norms through interaction, outlining a research agenda to develop embodied agents that are both capable and function as trustworthy safeguards of user privacy.
中文: 该立场文件针对具身智能体的推理隐私风险,提出了基于情境完整性的动态学习框架,使智能体能够通过交互适应个体隐私规范,在保障效用与隐私之间实现平衡。
English: This position paper addresses the inferential privacy risks of embodied agents by proposing a dynamic learning framework based on Contextual Integrity, enabling agents to adapt to individual privacy norms through interaction while balancing utility and privacy.
Authors:Hongxin Li, Jingran Su, Jingfan Chen, Zheng Ju, Yuntao Chen, Qing Li, Zhaoxiang Zhang
Abstract:
Building autonomous agents that perceive and operate graphical user interfaces (GUIs) like humans has long been a vision in the field of artificial intelligence. Central to these agents is the capability for GUI interaction, which involves GUI understanding and planning capabilities. Existing methods have tried developing GUI agents based on the multi-modal comprehension ability of vision-language models (VLMs). However, the limited scenario, insufficient size, and heterogeneous action spaces hinder the progress of building generalist GUI agents. To resolve these issues, this paper proposes \textbf{UIPro}, a novel generalist GUI agent trained with extensive multi-platform and multi-task GUI interaction data, coupled with a unified action space. We first curate a comprehensive dataset encompassing 20.6 million GUI understanding tasks to pre-train UIPro, granting it a strong GUI grounding capability, which is key to downstream GUI agent tasks. Subsequently, we establish a unified action space to harmonize heterogeneous GUI agent task datasets and produce a merged dataset to foster the action prediction ability of UIPro via continued fine-tuning. Experimental results demonstrate UIPro's superior performance across multiple GUI task benchmarks on various platforms, highlighting the effectiveness of our approach.
中文: 本文提出UIPro通用图形界面代理,通过整合多平台数据和统一动作空间解决现有方法的局限性,在各类图形界面任务基准测试中均表现出卓越性能。
English: This paper introduces UIPro, a generalist GUI agent trained on extensive multi-platform data with a unified action space to overcome limitations in existing methods, achieving superior performance across various GUI task benchmarks.
Authors:Shuning Zhang, Sixing Tao, Eve He, Yuting Yang, Ying Ma, Ailei Wang, Xin Yi, Hewu Li
Abstract:
Privacy policies are lengthy and complex, leading to user neglect. While contextual privacy policies (CPPs) present information at the point of risk, they may lack engagement and disrupt tasks. We propose Conflect, an interactive CPP for mobile apps, guided by a reflective thinking framework. Through three workshops with experienced designers and researchers, we constructed the design space of reflective thinking-based CPP design, and identified the disconnect between context and action as the most critical problem. Based on participants' feedback, we designed Conflect to use sidebar alerts, allowing users to reflect on contextualized risks and fostering their control. Our system contextually detects privacy risks, extracts policy segments, and automatically generates risk descriptions with 94.0% policy extraction accuracy on CPP4APP dataset and a 4.35s latency. A user study (N=28) demonstrated that Conflect improves user understanding, trust, and satisfaction while lowering cognitive load compared to CPPs, privacy policies and privacy labels.
中文总结:Conflect是一种基于反思思维框架的交互式情境隐私政策系统,通过侧边栏风险提示帮助用户理解情境化隐私风险,在提高用户理解度、信任感和满意度的同时有效降低认知负担。
English Summary: Conflect is an interactive contextual privacy policy system for mobile apps that enhances user understanding, trust, and satisfaction by providing real-time risk alerts and reducing cognitive load through reflective thinking.
Authors:Shuning Zhang, Yutong Jiang, Rongjun Ma, Yuting Yang, Mingyao Xu, Zhixin Huang, Xin Yi, Hewu Li
Abstract:
While web agents gained popularity by automating web interactions, their requirement for interface access introduces significant privacy risks that are understudied, particularly from users' perspective. Through a formative study (N=15), we found users frequently misunderstand agents' data practices, and desired unobtrusive, transparent data management. To achieve this, we designed and implemented PrivWeb, a trusted add-on on web agents that utilizes a localized LLM to anonymize private information on interfaces according to user preferences. It features privacy categorization schema and adaptive notifications that selectively pauses tasks for user control over information collection for highly sensitive information, while offering non-disruptive options for less sensitive information, minimizing human oversight. The user study (N=14) across travel, information retrieval, shopping, and entertainment tasks compared PrivWeb with baselines without notification and without control for private information access, where PrivWeb reduced perceived privacy risks with no associated increase in cognitive effort, and resulted in higher overall satisfaction.
中文摘要:PrivWeb作为一款浏览器插件,通过本地化LLM根据用户偏好对网页界面上的隐私信息进行匿名化处理,其自适应通知功能在用户研究中被证明能有效降低隐私风险且不增加认知负担。
English Summary: PrivWeb is a browser add-on that uses a localized LLM to anonymize private information on web interfaces according to user preferences, featuring adaptive notifications that reduce privacy risks without increasing cognitive effort, as demonstrated in user studies.
Authors:Pat Pataranutaporn, Sheer Karny, Chayapatr Archiwaranguprok, Constanze Albrecht, Auren R. Liu, Pattie Maes
Abstract:
The emergence of AI companion applications has created novel forms of intimate human-AI relationships, yet empirical research on these communities remains limited. We present the first large-scale computational analysis of r/MyBoyfriendIsAI, Reddit's primary AI companion community (27,000+ members). Using exploratory qualitative analysis and quantitative analysis employing classifiers, we identify six primary conversation themes, with visual sharing of couple pictures and ChatGPT-specific discussions dominating the discourse of the most viewed posts. Through analyzing the top posts in the community, our findings reveal how community members' AI companionship emerges unintentionally through functional use rather than deliberate seeking, with users reporting therapeutic benefits led by reduced loneliness, always-available support, and mental health improvements. Our work covers primary concerns about human intimacy with AIs such as emotional dependency, reality dissociation, and grief from model updates. We observe users materializing relationships following traditional human-human relationship customs, such as wedding rings. Community dynamics indicate active resistance to stigmatization through advocacy and mutual validation. This work contributes an empirical understanding of AI companionship as an emerging sociotechnical phenomenon.
中文摘要:本研究首次对人工智能伴侣社区进行大规模分析,揭示了用户如何通过功能性互动无意间形成具有治疗效果的陪伴关系,同时应对情感依赖和现实脱节等主要关切。
English Summary: This study provides the first large-scale analysis of an AI companion community, revealing how users unintentionally form therapeutic relationships through functional interactions while navigating concerns like emotional dependency and reality dissociation.
Authors:Shuning Zhang, Linzhi Wang, Dai Shi, Yuwei Chuai, Jingruo Chen, Yunyi Chen, Yifan Wang, Yating Wang, Xin Yi, Hewu Li
Abstract:
Community-based fact-checking is promising to reduce the spread of misleading posts at scale. However, its effectiveness can be undermined by the delays in fact-check delivery. Notably, user-initiated organic comments often contain debunking information and have the potential to help mitigate this limitation. Here, we investigate the feasibility of synthesizing comments to generate timely high-quality fact-checks. To this end, we analyze over 2.2 million replies on X and introduce Commenotes, a two-phase framework that filters and synthesizes comments to facilitate fact-check delivery. Our framework reveals that fact-checking comments appear early and sufficiently: 99.3\% of misleading posts receive debunking comments within the initial two hours since post publication, with synthesized \textit{commenotes} successfully earning user trust for 85.8\% of those posts. Additionally, a user study (N=144) found that the synthesized commenotes were often preferred, with the best-performing model achieving a 70.1\% win rate over human notes and being rated as significantly more helpful.
中文: 社区事实核查可通过合成用户有机评论中的辟谣信息来提升时效性,Commenotes框架显示99.3%的误导性帖子在两小时内获得辟谣评论,且合成笔记因更具帮助性而常被用户青睐。
English: Community-based fact-checking can be enhanced by synthesizing timely debunking comments from organic user replies, with the Commenotes framework demonstrating that 99.3% of misleading posts receive such comments within two hours and synthesized notes are often preferred for their helpfulness.
Authors:Yaniv Goren, Yuval Cohen, Alexander Apartsin, Yehudit Aperstein
Abstract:
In the realm of human-computer interaction, fostering a natural dialogue between humans and machines is paramount. A key, often overlooked, component of this dialogue is the use of interjections such as "mmm" and "hmm". Despite their frequent use to express agreement, hesitation, or requests for information, these interjections are typically dismissed as "non-words" by Automatic Speech Recognition (ASR) engines. Addressing this gap, we introduce a novel task dedicated to interjection classification, a pioneer in the field to our knowledge. This task is challenging due to the short duration of interjection signals and significant inter- and intra-speaker variability. In this work, we present and publish a dataset of interjection signals collected specifically for interjection classification. We employ this dataset to train and evaluate a baseline deep learning model. To enhance performance, we augment the training dataset using techniques such as tempo and pitch transformation, which significantly improve classification accuracy, making models more robust. The interjection dataset, a Python library for the augmentation pipeline, baseline model, and evaluation scripts, are available to the research community.
Authors:Shuning Zhang, Ying Ma, Jingruo Chen, Simin Li, Xin Yi, Hewu Li
Abstract:
The proliferation of AI agents, with their complex and context-dependent actions, renders conventional privacy paradigms obsolete. This position paper argues that the current model of privacy management, rooted in a user's unilateral control over a passive tool, is inherently mismatched with the dynamic and interactive nature of AI agents. We contend that ensuring effective privacy protection necessitates that the agents proactively align with users' privacy preferences instead of passively waiting for the user to control. To ground this shift, and using personalized conversational recommendation agents as a case, we propose a conceptual framework built on Contextual Integrity (CI) theory and Privacy Calculus theory. This synthesis first reframes automatically controlling users' privacy as an alignment problem, where AI agents initially did not know users' preferences, and would learn their privacy preferences through implicit or explicit feedback. Upon receiving the preference feedback, the agents used alignment and Pareto optimization for aligning preferences and balancing privacy and utility. We introduced formulations and instantiations, potential applications, as well as five challenges.
中文摘要:本文认为传统隐私模式不适用于AI智能体,提出基于情境完整性和隐私计算理论的新框架,使智能体能够主动学习并适应用户的隐私偏好。
English Summary: The abstract argues that traditional privacy models are inadequate for AI agents and proposes a new framework where agents proactively learn and align with users' privacy preferences through contextual integrity and privacy calculus theories.
Authors:Shuning Zhang, Rongjun Ma, Ying Ma, Shixuan Li, Yiqun Xu, Xin Yi, Hewu Li
Abstract:
Large Language Models (LLMs) are increasingly integrating memory functionalities to provide personalized and context-aware interactions. However, user understanding, practices and expectations regarding these memory systems are not yet well understood. This paper presents a thematic analysis of semi-structured interviews with 18 users to explore their mental models of LLM's Retrieval Augmented Generation (RAG)-based memory, current usage practices, perceived benefits and drawbacks, privacy concerns and expectations for future memory systems. Our findings reveal diverse and often incomplete mental models of how memory operates. While users appreciate the potential for enhanced personalization and efficiency, significant concerns exist regarding privacy, control and the accuracy of remembered information. Users express a desire for granular control over memory generation, management, usage and updating, including clear mechanisms for reviewing, editing, deleting and categorizing memories, as well as transparent insight into how memories and inferred information are used. We discuss design implications for creating more user-centric, transparent, and trustworthy LLM memory systems.
中文摘要:本研究探讨用户对大型语言模型记忆系统的认知,发现用户对记忆机制理解不全面,虽认可个性化优势但高度关注隐私与控制问题,为构建透明可信的记忆系统提供了设计启示。
English Summary: This study explores user perceptions of LLM memory systems, revealing incomplete mental models alongside appreciation for personalization but significant privacy and control concerns, with design implications for more transparent and trustworthy systems.
Authors:Shuning Zhang, Gengrui Zhang, Yibo Meng, Ziyi Zhang, Hantao Zhao, Xin Yi, Hewu Li
Abstract:
The rapid advancement of Visual Language Models (VLMs) has enabled sophisticated analysis of visual content, leading to concerns about the inference of sensitive user attributes and subsequent privacy risks. While technical capabilities of VLMs are increasingly studied, users' understanding, perceptions, and reactions to these inferences remain less explored, especially concerning videos uploaded on the social media. This paper addresses this gap through a semi-structured interview (N=17), investigating user perspectives on VLM-driven sensitive attribute inference from their visual data. Findings reveal that users perceive VLMs as capable of inferring a range of attributes, including location, demographics, and socioeconomic indicators, often with unsettling accuracy. Key concerns include unauthorized identification, misuse of personal information, pervasive surveillance, and harm from inaccurate inferences. Participants reported employing various mitigation strategies, though with skepticism about their ultimate effectiveness against advanced AI. Users also articulate clear expectations for platforms and regulators, emphasizing the need for enhanced transparency, user control, and proactive privacy safeguards. These insights are crucial for guiding the development of responsible AI systems, effective privacy-enhancing technologies, and informed policymaking that aligns with user expectations and societal values.
Visual Language Models raise significant privacy concerns by inferring sensitive user attributes from visual data, prompting users to demand greater transparency, control, and regulatory safeguards against potential misuse and surveillance.
English Summary:
Authors:Shuning Zhang, Han Chen, Yabo Wang, Yiqun Xu, Jiaqi Bai, Yuanyuan Wu, Shixuan Li, Xin Yi, Chunhui Wang, Hewu Li
Abstract:
Pervasive voice interaction enables deceptive patterns through subtle voice characteristics, yet empirical investigation into this manipulation lags behind, especially within major non-English language contexts. Addressing this gap, our study presents the first systematic investigation into voice characteristic-based dark patterns employing female synthetic voices in Mandarin Chinese. This focus is crucial given the prevalence of female personas in commercial assistants and the prosodic significance in the Chinese language. Guided by the conceptual framework identifying key influencing factors, we systematically evaluate effectiveness variations by manipulating voice characteristics (five characteristics, three intensities) across different scenarios (shopping vs. question-answering) with different commercial aims. A preliminary study (N=24) validated the experimental materials and the main study (N=36) revealed significant behavioral manipulation (up to +2027.6%). Crucially, the analysis showed that effectiveness varied significantly with voice characteristics and scenario, mediated by user perception (of tone, intonation, timbre) and user demographics (individual preferences, though limited demographic impact). These interconnected findings offer evidence-based insights for ethical design.
中文摘要:本研究首次系统探讨了基于女性合成普通话语音的暗黑模式,揭示了语音特征、使用场景、用户感知及人口统计学因素共同影响行为操纵效果的关键发现。
English Summary: This study pioneers the systematic investigation of voice-based dark patterns using female synthetic Mandarin voices, revealing significant behavioral manipulation influenced by voice characteristics, scenarios, user perception, and demographics.
Authors:Shuning Zhang, Ying Ma, Yongquan `Owen' Hu, Ting Dang, Hong Jia, Xin Yi, Hewu Li
Abstract:
Online medical consultation platforms, while convenient, are undermined by significant privacy risks that erode user trust. We first conducted in-depth semi-structured interviews with 12 users to understand their perceptions of security and privacy landscapes on online medical consultation platforms, as well as their practices, challenges and expectation. Our analysis reveals a critical disconnect between users' desires for anonymity and control, and platform realities that offload the responsibility of ``privacy labor''. To bridge this gap, we present SafeShare, an interaction technique that leverages localized LLM to redact consultations in real-time. SafeShare balances utility and privacy through selectively anonymize private information. A technical evaluation of SafeShare's core PII detection module on 3 dataset demonstrates high efficacy, achieving 89.64\% accuracy with Qwen3-4B on IMCS21 dataset.
中文: 在线医疗咨询平台存在严重的隐私风险,为此开发了SafeShare工具,利用本地化大语言模型实时匿名化敏感信息,在保护隐私的同时保持实用性,并展现出高准确率。
English: Online medical consultation platforms face significant privacy risks that undermine user trust, prompting the development of SafeShare, a real-time redaction tool using localized LLMs to balance privacy and utility with high accuracy.
Authors:Shuning Zhang, Ying Ma, Xin Yi, Hewu Li
Abstract:
The proliferation of visual sensors in smart home environments, particularly through wearable devices like smart glasses, introduces profound privacy challenges. Existing privacy controls are often static and coarse-grained, failing to accommodate the dynamic and socially nuanced nature of home environments. This paper investigates the viability of using Large Language Models (LLMs) as the core of a dynamic and adaptive privacy policy engine. We propose a conceptual framework where visual data is classified using a multi-dimensional schema that considers data sensitivity, spatial context, and social presence. An LLM then reasons over this contextual information to enforce fine-grained privacy rules, such as selective object obfuscation, in real-time. Through a comparative evaluation of state-of-the-art Vision Language Models (including GPT-4o and the Qwen-VL series) in simulated home settings , our findings show the feasibility of this approach. The LLM-based engine achieved a top machine-evaluated appropriateness score of 3.99 out of 5, and the policies generated by the models received a top human-evaluated score of 4.00 out of 5.
中文:本文提出了一种利用大型语言模型的动态隐私框架,在智能家居环境中实施细粒度视觉数据保护,其策略在机器和人工评估中均获得高分。
English: This paper proposes a dynamic privacy framework using Large Language Models to enforce fine-grained visual data protection in smart homes, achieving high machine and human evaluation scores for policy appropriateness.
Authors:Pat Pataranutaporn, Nattavudh Powdthavee, Chayapatr Archiwaranguprok, Pattie Maes
Abstract:
Subjective well-being is a key metric in economic, medical, and policy decision-making. As artificial intelligence provides scalable tools for modelling human outcomes, it is crucial to evaluate whether large language models (LLMs) can accurately predict well-being across diverse global populations. We evaluate four leading LLMs using data from 64,000 individuals in 64 countries. While LLMs capture broad correlates such as income and health, their predictive accuracy decreases in countries underrepresented in the training data, highlighting systematic biases rooted in global digital and economic inequality. A pre-registered experiment demonstrates that LLMs rely on surface-level linguistic similarity rather than conceptual understanding, leading to systematic misestimations in unfamiliar or resource-limited settings. Injecting findings from underrepresented contexts substantially enhances performance, but a significant gap remains. These results highlight both the promise and limitations of LLMs in predicting global well-being, underscoring the importance of robust validation prior to their implementation across these areas.
Chinese: 大型语言模型在预测主观幸福感方面展现出潜力,但对数据代表性不足的群体存在系统性偏差,实际应用前需进行严格验证。
English: Large language models show promise in predicting subjective well-being but exhibit systematic biases against underrepresented populations, requiring robust validation before implementation.
Authors:Shuning Zhang, Hui Wang, Xin Yi
Abstract:
As Artificial Intelligence (AI) increasingly becomes an active collaborator in co-creation, understanding the distribution and dynamic of agency is paramount. The Human-Computer Interaction (HCI) perspective is crucial for this analysis, as it uniquely reveals the interaction dynamics and specific control mechanisms that dictate how agency manifests in practice. Despite this importance, a systematic synthesis mapping agency configurations and control mechanisms within the HCI/CSCW literature is lacking. Addressing this gap, we reviewed 134 papers from top-tier HCI/CSCW venues (e.g., CHI, UIST, CSCW) over the past 20 years. This review yields four primary contributions: (1) an integrated theoretical framework structuring agency patterns, control mechanisms, and interaction contexts, (2) a comprehensive operational catalog of control mechanisms detailing how agency is implemented; (3) an actionable cross-context map linking agency configurations to diverse co-creative practices; and (4) grounded implications and guidance for future CSCW research and the design of co-creative systems, addressing aspects like trust and ethics.
中文摘要:本研究通过分析134篇人机交互文献,构建了人类与AI协同创作中的代理权配置与控制机制框架,为未来协同系统的设计与伦理考量提供了实践指导。
English Summary: This study synthesizes 134 HCI/CSCW papers to develop a framework mapping agency configurations and control mechanisms in human-AI co-creation, offering practical guidance for future system design.
Authors:Jingjing Qu, Kejia Hu, Jun Zhu, Wenhao Li, Teng Wang, Zhiyun Chen, Yulei Ye, Chaochao Lu, Aimin Zhou, Xiangfeng Wang, James Evans
Abstract:
The integration of Large Language Models (LLMs) into social science experiments represents a transformative approach to understanding human-AI interactions and their societal impacts. We introduce Epitome, the world's first open experimental platform dedicated to the deep integration of artificial intelligence and social science. Rooted in theoretical foundations from management, communication studies, sociology, psychology, and ethics, Epitome focuses on the interactive impacts of AI on individuals, organizations, and society during its real-world deployment. It constructs a theoretical support system through cross-disciplinary experiments. The platform offers a one-stop comprehensive experimental solution spanning "foundation models-complex application development-user feedback" through seven core modules, while embedding the classical "control-comparison-comparative causal logic" of social science experiments into multilevel human-computer interaction environments, including dialogues, group chats, and multi-agent virtual scenarios. With its canvas-style, user-friendly interface, Epitome enables researchers to easily design and run complex experimental scenarios, facilitating systematic investigations into the social impacts of AI and exploration of integrated solutions.To demonstrate its capabilities, we replicated three seminal social science experiments involving LLMs, showcasing Epitome's potential to streamline complex experimental designs and produce robust results, suitable for publishing in the top selective journals. Our findings highlight the platform's utility in enhancing the efficiency and quality of human-AI interactions, providing valuable insights into the societal implications of AI technologies. Epitome thus offers a powerful tool for advancing interdisciplinary research at the intersection of AI and social science, with potential applications in policy-making, ...
中文摘要:Epitome是全球首个深度融合人工智能与社会科学的开放实验平台,通过跨学科实验和直观界面助力研究人员探索人机交互的社会影响,推动前沿学术研究。
English Summary: Epitome is the first open platform integrating AI with social science to study human-AI interactions through multidisciplinary experiments, offering user-friendly tools for designing complex scenarios and generating robust research outcomes.
Authors:Jacopo Nudo, Mario Edoardo Pandolfo, Edoardo Loru, Mattia Samory, Matteo Cinelli, Walter Quattrociocchi
Abstract:
We investigate how Large Language Models (LLMs) behave when simulating political discourse on social media. Leveraging 21 million interactions on X during the 2024 U.S. presidential election, we construct LLM agents based on 1,186 real users, prompting them to reply to politically salient tweets under controlled conditions. Agents are initialized either with minimal ideological cues (Zero Shot) or recent tweet history (Few Shot), allowing one-to-one comparisons with human replies. We evaluate three model families (Gemini, Mistral, and DeepSeek) across linguistic style, ideological consistency, and toxicity. We find that richer contextualization improves internal consistency but also amplifies polarization, stylized signals, and harmful language. We observe an emergent distortion that we call "generation exaggeration": a systematic amplification of salient traits beyond empirical baselines. Our analysis shows that LLMs do not emulate users, they reconstruct them. Their outputs, indeed, reflect internal optimization dynamics more than observed behavior, introducing structural biases that compromise their reliability as social proxies. This challenges their use in content moderation, deliberative simulations, and policy modeling.
中文: 研究发现,大语言模型在模拟政治讨论时会通过"生成夸张"效应加剧极化与有害内容,其输出更多反映算法内部优化而非真实用户行为,这削弱了其作为社会模拟工具的可靠性。
English: This study reveals that LLMs simulating political discourse amplify polarization and harmful content through "generation exaggeration," reflecting internal algorithmic biases rather than genuine user behavior, which undermines their reliability for social applications.
Authors:Jiakai Tang, Yujie Luo, Xunke Xi, Fei Sun, Xueyang Feng, Sunhao Dai, Chao Yi, Dian Chen, Zhujin Gao, Yang Li, Xu Chen, Wen Chen, Jian Wu, Yuning Jiang, Bo Zheng
Abstract:
Traditional recommender systems rely on passive feedback mechanisms that limit users to simple choices such as like and dislike. However, these coarse-grained signals fail to capture users' nuanced behavior motivations and intentions. In turn, current systems cannot also distinguish which specific item attributes drive user satisfaction or dissatisfaction, resulting in inaccurate preference modeling. These fundamental limitations create a persistent gap between user intentions and system interpretations, ultimately undermining user satisfaction and harming system effectiveness. To address these limitations, we introduce the Interactive Recommendation Feed (IRF), a pioneering paradigm that enables natural language commands within mainstream recommendation feeds. Unlike traditional systems that confine users to passive implicit behavioral influence, IRF empowers active explicit control over recommendation policies through real-time linguistic commands. To support this paradigm, we develop RecBot, a dual-agent architecture where a Parser Agent transforms linguistic expressions into structured preferences and a Planner Agent dynamically orchestrates adaptive tool chains for on-the-fly policy adjustment. To enable practical deployment, we employ simulation-augmented knowledge distillation to achieve efficient performance while maintaining strong reasoning capabilities. Through extensive offline and long-term online experiments, RecBot shows significant improvements in both user satisfaction and business outcomes.
中文: 传统推荐系统依赖被动反馈机制,难以捕捉用户细粒度偏好,而提出的交互式推荐信息流(IRF)通过其RecBot双智能体架构实现了基于自然语言的主动控制,借助实时策略调整显著提升了用户满意度和系统效果。
English: Traditional recommender systems' reliance on passive feedback limits nuanced user preference capture, but the proposed Interactive Recommendation Feed (IRF) with its RecBot dual-agent architecture enables active linguistic control, significantly improving satisfaction and outcomes through real-time policy adjustments.
Authors:Eason Chen, Chuangji Li, Shizhuo Li, Zimo Xiao, Jionghao Lin, Kenneth R. Koedinger
Abstract:
Technology-enhanced learning environments often help students retrieve relevant learning content for questions arising during self-paced study. Large language models (LLMs) have emerged as novel aids for information retrieval during learning. While LLMs are effective for general-purpose question-answering, they typically lack alignment with the domain knowledge of specific course materials such as textbooks and slides. We investigate Retrieval-Augmented Generation (RAG) and GraphRAG, a knowledge graph-enhanced RAG approach, for page-level question answering in an undergraduate mathematics textbook. While RAG has been effective for retrieving discrete, contextually relevant passages, GraphRAG may excel in modeling interconnected concepts and hierarchical knowledge structures. We curate a dataset of 477 question-answer pairs, each tied to a distinct textbook page. We then compare the standard embedding-based RAG methods to GraphRAG for evaluating both retrieval accuracy-whether the correct page is retrieved-and generated answer quality via F1 scores. Our findings show that embedding-based RAG achieves higher retrieval accuracy and better F1 scores compared to GraphRAG, which tends to retrieve excessive and sometimes irrelevant content due to its entity-based structure. We also explored re-ranking the retrieved pages with LLM and observed mixed results, including performance drop and hallucinations when dealing with larger context windows. Overall, this study highlights both the promises and challenges of page-level retrieval systems in educational contexts, emphasizing the need for more refined retrieval methods to build reliable AI tutoring solutions in providing reference page numbers.
中文: 本研究比较了检索增强生成(RAG)和图增强RAG(GraphRAG)的教材页面检索效果,发现标准RAG在准确性和答案质量上更优,同时揭示了教育AI系统在提供精确页面参考时面临的挑战。
English: This study compares Retrieval-Augmented Generation (RAG) and GraphRAG for textbook page retrieval, finding that standard RAG outperforms GraphRAG in accuracy and answer quality while highlighting the challenges of educational AI systems.
Authors:Ehsan Latif, Zirak Khan, Xiaoming Zhai
Abstract:
Scientific sketches (e.g., models) offer a powerful lens into students' conceptual understanding, yet AI-powered automated assessment of such free-form, visually diverse artifacts remains a critical challenge. Existing solutions often treat sketch evaluation as either an image classification task or monolithic vision-language models, which lack interpretability, pedagogical alignment, and adaptability across cognitive levels. To address these limitations, we present SketchMind, a cognitively grounded, multi-agent framework for evaluating and improving student-drawn scientific sketches. SketchMind comprises modular agents responsible for rubric parsing, sketch perception, cognitive alignment, and iterative feedback with sketch modification, enabling personalized and transparent evaluation. We evaluate SketchMind on a curated dataset of 3,575 student-generated sketches across six science assessment items with different highest order of Bloom's level that require students to draw models to explain phenomena. Compared to baseline GPT-4o performance without SRG (average accuracy: 55.6%), and with SRG integration achieves 77.1% average accuracy (+21.4% average absolute gain). We also demonstrate that multi-agent orchestration with SRG enhances SketchMind performance, for example, GPT-4.1 gains an average 8.9% increase in sketch prediction accuracy, outperforming single-agent pipelines across all items. Human evaluators rated the feedback and co-created sketches generated by \textsc{SketchMind} with GPT-4.1, which achieved an average of 4.1 out of 5, significantly higher than those of baseline models (e.g., 2.3 for GPT-4o). Experts noted the system's potential to meaningfully support conceptual growth through guided revision. Our code and (pending approval) dataset will be released to support reproducibility and future research in AI-driven education.
Chinese: SketchMind提出了一种基于认知的多智能体框架,通过模块化评估和迭代反馈,显著提升了AI对学生科学草图评估的准确性、可解释性和个性化指导能力。
English: SketchMind introduces a cognitively grounded, multi-agent framework that significantly improves AI-based evaluation of student-drawn scientific sketches by enhancing accuracy, interpretability, and personalized feedback compared to existing methods.
Authors:Efe Bozkir, Babette Bühler, Xiaoyuan Wu, Enkelejda Kasneci, Lujo Bauer, Lorrie Faith Cranor
Abstract:
Augmented reality technology will likely be prevalent with more affordable head-mounted displays. Integrating novel interaction modalities such as eye trackers into head-mounted displays could lead to collecting vast amounts of biometric data, which may allow inference of sensitive user attributes like health status or sexual preference, posing privacy issues. While previous works broadly examined privacy concerns about augmented reality, ours is the first to extensively explore privacy concerns on behavioral data, particularly eye tracking in augmented reality. We crowdsourced four survey studies in the United States (n1 = 48, n2 = 525) and Germany (n3 = 48, n4 = 525) to understand the impact of user attributes, augmented reality devices, use cases, data practices, and country on privacy concerns. Our findings indicate that participants are generally concerned about privacy when they know what inferences can be made based on the collected data. Despite the more prominent use of smartphones in daily life than augmented reality glasses, we found no indications of differing privacy concerns depending on the device type. In addition, our participants are more comfortable when a particular use case benefits them and less comfortable when other humans can consume their data. Furthermore, participants in the United States are less concerned about their privacy than those in Germany. Based on our findings, we provide several recommendations to practitioners and policymakers for privacy-aware augmented reality.
Chinese: 增强现实技术结合眼动追踪可能通过推断敏感用户特征引发隐私风险,研究显示美国和德国参与者了解数据推断后普遍担忧隐私,且舒适度受使用场景和国籍影响,美国参与者比德国参与者更不关注隐私。
English: Augmented reality's integration of eye tracking raises privacy risks by potentially inferring sensitive user traits, with studies in the U.S. and Germany showing heightened concerns when data inferences are understood and varying comfort levels based on use cases and nationality.
Authors:Leixian Shen, Leni Yang, Haotian Li, Yun Wang, Yuyu Luo, Huamin Qu
Abstract:
Empirical research in creative design deepens our theoretical understanding of design principles and perceptual effects, offering valuable guidance for innovating creation tools. However, how these empirical insights currently influence the development of creation tools, and how their integration can be enhanced in the future, remains insufficiently understood. In this paper, we aim to unveil the gap through a case study on data videos, a prominent and wide-spread medium for effective data storytelling. To achieve the goal, we conducted a comprehensive analysis of 46 empirical research papers and 48 creation tool papers on data video, complemented by interviews with 11 experts. Building upon a systematic collection and structured characterization of empirical research by their methodologies (e.g., corpus analysis, comparative evaluations) and component focus (e.g., visuals, motions, narratives, audio), we conducted a context-aware citation analysis and revealed a taxonomy of recurring patterns in how empirical findings inform tool design across citation functions (e.g., problem framing, technical reference). Expert interviews further uncovered researchers' practice patterns in applying empirical findings (e.g., adaptation, synthesis, iteration, etc.) and identified key factors influencing applicability, such as contextual relevance, granularity matching, clarity, credibility, and feasibility. Finally, we derive suggestions and discuss future opportunities to foster closer mutual engagement between empirical and tool research, aiming to reinforce the theoretical grounding of creation tools and enhance the practical impact of empirical research.
中文摘要:本研究通过数据视频案例揭示了实证研究与创作工具开发之间的应用鸿沟,系统分析了引用模式与专家实践,并提出加强两者协同发展的具体路径。
English Summary: This study investigates the gap between empirical design research and creation tool development through data video analysis, revealing citation patterns and expert practices while proposing strategies to strengthen their mutual integration.
Authors:Shiye Cao, Maia Stiber, Amama Mahmood, Maria Teresa Parreira, Wendy Ju, Micol Spitale, Hatice Gunes, Chien-Ming Huang
Abstract:
The integration of large language models (LLMs) into conversational robots has made human-robot conversations more dynamic. Yet, LLM-powered conversational robots remain prone to errors, e.g., misunderstanding user intent, prematurely interrupting users, or failing to respond altogether. Detecting and addressing these failures is critical for preventing conversational breakdowns, avoiding task disruptions, and sustaining user trust. To tackle this problem, the ERR@HRI 2.0 Challenge provides a multimodal dataset of LLM-powered conversational robot failures during human-robot conversations and encourages researchers to benchmark machine learning models designed to detect robot failures. The dataset includes 16 hours of dyadic human-robot interactions, incorporating facial, speech, and head movement features. Each interaction is annotated with the presence or absence of robot errors from the system perspective, and perceived user intention to correct for a mismatch between robot behavior and user expectation. Participants are invited to form teams and develop machine learning models that detect these failures using multimodal data. Submissions will be evaluated using various performance metrics, including detection accuracy and false positive rate. This challenge represents another key step toward improving failure detection in human-robot interaction through social signal analysis.
中文: ERR@HRI 2.0挑战赛通过提供多模态对话机器人故障数据集,推动机器学习模型在检测人机交互故障方面的研究,以增强系统可靠性和用户信任度。
English: The ERR@HRI 2.0 Challenge introduces a multimodal dataset of conversational robot failures to advance machine learning models for detecting errors in human-robot interactions, thereby improving reliability and user trust.
Authors:Tao Han, Ang Li, Qinyu Chen, Chang Gao
Abstract:
Eye tracking has become a key technology for gaze-based interactions in Extended Reality (XR). However, conventional frame-based eye-tracking systems often fall short of XR's stringent requirements for high accuracy, low latency, and energy efficiency. Event cameras present a compelling alternative, offering ultra-high temporal resolution and low power consumption. In this paper, we present JaneEye, an energy-efficient event-based eye-tracking hardware accelerator designed specifically for wearable devices, leveraging sparse, high-temporal-resolution event data. We introduce an ultra-lightweight neural network architecture featuring a novel ConvJANET layer, which simplifies the traditional ConvLSTM by retaining only the forget gate, thereby halving computational complexity without sacrificing temporal modeling capability. Our proposed model achieves high accuracy with a pixel error of 2.45 on the 3ET+ dataset, using only 17.6K parameters, with up to 1250 Hz event frame rate. To further enhance hardware efficiency, we employ custom linear approximations of activation functions (hardsigmoid and hardtanh) and fixed-point quantization. Through software-hardware co-design, our 12-nm ASIC implementation operates at 400 MHz, delivering an end-to-end latency of 0.5 ms (equivalent to 2000 Frames Per Second (FPS)) at an energy efficiency of 18.9 $μ$J/frame. JaneEye sets a new benchmark in low-power, high-performance eye-tracking solutions suitable for integration into next-generation XR wearables.
Chinese: JaneEye是一种基于事件的眼动追踪硬件加速器,采用超轻量神经网络和定制硬件优化,为下一代XR可穿戴设备实现了高精度、低延迟和高能效的性能。
English: JaneEye is an event-based eye-tracking hardware accelerator that uses an ultra-lightweight neural network and custom hardware optimizations to achieve high accuracy, low latency, and energy efficiency for next-generation XR wearables.
Authors:Mubaris Nadeem, Johannes Zenkert, Lisa Bender, Christian Weber, Madjid Fathi
Abstract:
In emergencies, treatment needs to be fast, accu-rate and patient-specific. For instance, in emergency scenarios, obstacles like treatment environments and medical difficulties can lead to bad outcomes for patients. Additionally, a drastic change of health vitals can force paramedics to shift to a different treatment in the ongoing treatment of the patient in order to save a patient's life. The KIRETT (engl.: 'Artificial intelligence in rescue operations') demonstrator is developed to provide a rescue operator with a wrist-worn device, enabling treatment recommendation (with the help of knowledge graph) with situation detection models to improve the emergency treatment of a patient. This paper aims to provide a qualitative evaluation of the 2-days testing in the KIRETT project with the focus of knowledge graphs, knowledge fusion, and user-experience-design (UX-design).
Authors:Elahe Vedadi, David Barrett, Natalie Harris, Ellery Wulczyn, Shashir Reddy, Roma Ruparel, Mike Schaekermann, Tim Strother, Ryutaro Tanno, Yash Sharma, Jihyeon Lee, CÃan Hughes, Dylan Slack, Anil Palepu, Jan Freyberg, Khaled Saab, Valentin Liévin, Wei-Hung Weng, Tao Tu, Yun Liu, Nenad Tomasev, Kavita Kulkarni, S. Sara Mahdavi, Kelvin Guu, Joëlle Barral, Dale R. Webster, James Manyika, Avinatan Hassidim, Katherine Chou, Yossi Matias, Pushmeet Kohli, Adam Rodman, Vivek Natarajan, Alan Karthikesalingam, David Stutz
Abstract:
Recent work has demonstrated the promise of conversational AI systems for diagnostic dialogue. However, real-world assurance of patient safety means that providing individual diagnoses and treatment plans is considered a regulated activity by licensed professionals. Furthermore, physicians commonly oversee other team members in such activities, including nurse practitioners (NPs) or physician assistants/associates (PAs). Inspired by this, we propose a framework for effective, asynchronous oversight of the Articulate Medical Intelligence Explorer (AMIE) AI system. We propose guardrailed-AMIE (g-AMIE), a multi-agent system that performs history taking within guardrails, abstaining from individualized medical advice. Afterwards, g-AMIE conveys assessments to an overseeing primary care physician (PCP) in a clinician cockpit interface. The PCP provides oversight and retains accountability of the clinical decision. This effectively decouples oversight from intake and can thus happen asynchronously. In a randomized, blinded virtual Objective Structured Clinical Examination (OSCE) of text consultations with asynchronous oversight, we compared g-AMIE to NPs/PAs or a group of PCPs under the same guardrails. Across 60 scenarios, g-AMIE outperformed both groups in performing high-quality intake, summarizing cases, and proposing diagnoses and management plans for the overseeing PCP to review. This resulted in higher quality composite decisions. PCP oversight of g-AMIE was also more time-efficient than standalone PCP consultations in prior work. While our study does not replicate existing clinical practices and likely underestimates clinicians' capabilities, our results demonstrate the promise of asynchronous oversight as a feasible paradigm for diagnostic AI systems to operate under expert human oversight for enhancing real-world care.
中文: 提出的g-AMIE系统通过异步监督机制,在避免直接诊断的前提下实现AI辅助病史采集,临床模拟显示其在遵循相同安全规范时表现优于人类医疗提供者。
English: The proposed g-AMIE system enables asynchronous physician oversight of AI-driven medical history collection while abstaining from direct diagnosis, demonstrating superior performance in clinical simulations compared to human providers under equivalent safety constraints.
Authors:Chenpei Huang, Lingfeng Yao, Hui Zhong, Kyu In Lee, Lan Zhang, Xiaoyong Yuan, Tomoaki Ohtsuki, Miao Pan
Abstract:
Ear canal scanning/sensing (ECS) has emerged as a novel biometric authentication method for mobile devices paired with wireless earbuds. Existing studies have demonstrated the uniqueness of ear canals by training and testing machine learning classifiers on ECS data. However, implementing practical ECS-based authentication requires preventing raw biometric data leakage and designing computationally efficient protocols suitable for resource-constrained earbuds. To address these challenges, we propose an ear canal key extraction protocol, \textbf{EarID}. Without relying on classifiers, EarID extracts unique binary keys directly on the earbuds during authentication. These keys further allow the use of privacy-preserving fuzzy commitment scheme that verifies the wearer's key on mobile devices. Our evaluation results demonstrate that EarID achieves a 98.7\% authentication accuracy, comparable to machine learning classifiers. The mobile enrollment time (160~ms) and earbuds processing time (226~ms) are negligible in terms of wearer's experience. Moreover, our approach is robust and attack-resistant, maintaining a false acceptance rate below 1\% across all adversarial scenarios. We believe the proposed EarID offers a practical and secure solution for next-generation wireless earbuds.
Authors:Alexandra Klymenko, Stephen Meisenbacher, Patrick Gage Kelley, Sai Teja Peddinti, Kurt Thomas, Florian Matthes
Abstract:
The proliferation of AI has sparked privacy concerns related to training data, model interfaces, downstream applications, and more. We interviewed 25 AI developers based in Europe to understand which privacy threats they believe pose the greatest risk to users, developers, and businesses and what protective strategies, if any, would help to mitigate them. We find that there is little consensus among AI developers on the relative ranking of privacy risks. These differences stem from salient reasoning patterns that often relate to human rather than purely technical factors. Furthermore, while AI developers are aware of proposed mitigation strategies for addressing these risks, they reported minimal real-world adoption. Our findings highlight both gaps and opportunities for empowering AI developers to better address privacy risks in AI.
Chinese: 研究表明,人工智能开发者对隐私风险的优先排序缺乏共识,这更多源于人为因素而非技术原因,且缓解策略的实际应用极少,尽管他们有所认知,凸显了改进的潜力。
English: The study reveals a lack of consensus among AI developers on privacy risk priorities, driven by human factors rather than technical ones, and minimal adoption of mitigation strategies despite awareness, highlighting opportunities for improvement.
Authors:Litao Yan, Andrew Head, Ken Milne, Vu Le, Sumit Gulwani, Chris Parnin, Emerson Murphy-Hill
Abstract:
Many users struggle to notice when a more efficient workflow exists in feature-rich tools like Excel. Existing AI assistants offer help only after users describe their goals or problems, which can be effortful and imprecise. We present InvisibleMentor, a system that turns screen recordings of task completion into vision-grounded reflections on tasks. It detects issues such as repetitive edits and recommends more efficient alternatives based on observed behavior. Unlike prior systems that rely on logs, APIs, or user prompts, InvisibleMentor operates directly on screen recordings. It uses a two-stage pipeline: a vision-language model reconstructs actions and context, and a language model generates structured, high-fidelity suggestions. In evaluation, InvisibleMentor accurately identified inefficient workflows, and participants found its suggestions more actionable, tailored, and more helpful for learning and improvement compared to a prompt-based spreadsheet assistant.
中文: InvisibleMentor系统通过分析屏幕录像自动检测低效工作流程(如重复编辑),无需用户输入即可提供针对性效率建议,相比基于提示的助手更具可操作性和帮助性。
English: InvisibleMentor is a system that analyzes screen recordings to automatically detect inefficient workflows, such as repetitive edits, and provides tailored efficiency recommendations without requiring user input, proving more actionable and helpful than prompt-based assistants.
Authors:Shutong Fan, Lan Zhang, Xiaoyong Yuan
Abstract:
As Artificial Intelligence (AI) increasingly supports human decision-making, its vulnerability to adversarial attacks grows. However, the existing adversarial analysis predominantly focuses on fully autonomous AI systems, where decisions are executed without human intervention. This narrow focus overlooks the complexities of human-AI collaboration, where humans interpret, adjust, and act upon AI-generated decisions. Trust, expectations, and cognitive behaviors influence how humans interact with AI, creating dynamic feedback loops that adversaries can exploit. To strengthen the robustness of AI-assisted decision-making, adversarial analysis must account for the interplay between human factors and attack strategies. This position paper argues that human factors fundamentally reshape adversarial analysis and must be incorporated into evaluating robustness in human-AI decision-making systems. To fully explore human factors in adversarial analysis, we begin by investigating the role of human factors in human-AI collaboration through a comprehensive review. We then introduce a novel robustness analysis framework that (1) examines how human factors affect collaborative decision-making performance, (2) revisits and interprets existing adversarial attack strategies in the context of human-AI interaction, and (3) introduces a new timing-based adversarial attack as a case study, illustrating vulnerabilities emerging from sequential human actions. The experimental results reveal that attack timing uniquely impacts decision outcomes in human-AI collaboration. We hope this analysis inspires future research on adversarial robustness in human-AI systems, fostering interdisciplinary approaches that integrate AI security, human cognition, and decision-making dynamics.
Authors:Tianjian Liu, Fanqi Wan, Jiajian Guo, Xiaojun Quan
Abstract:
Proactive dialogue has emerged as a critical and challenging research problem in advancing large language models (LLMs). Existing works predominantly focus on domain-specific or task-oriented scenarios, which leads to fragmented evaluations and limits the comprehensive exploration of models' proactive conversation abilities. In this work, we propose ProactiveEval, a unified framework designed for evaluating proactive dialogue capabilities of LLMs. This framework decomposes proactive dialogue into target planning and dialogue guidance, establishing evaluation metrics across various domains. Moreover, it also enables the automatic generation of diverse and challenging evaluation data. Based on the proposed framework, we develop 328 evaluation environments spanning 6 distinct domains. Through experiments with 22 different types of LLMs, we show that DeepSeek-R1 and Claude-3.7-Sonnet exhibit exceptional performance on target planning and dialogue guidance tasks, respectively. Finally, we investigate how reasoning capabilities influence proactive behaviors and discuss their implications for future model development.
Chinese: 本文提出了ProactiveEval框架,通过将主动对话分解为目标规划和对话引导来评估大语言模型,实验发现DeepSeek-R1和Claude-3.7-Sonnet分别在不同任务中表现优异,并探讨了推理能力对主动行为的影响。
English: This paper introduces ProactiveEval, a unified framework for evaluating proactive dialogue in LLMs by decomposing it into target planning and dialogue guidance, and through extensive testing, identifies DeepSeek-R1 and Claude-3.7-Sonnet as top performers while exploring the role of reasoning in proactive behaviors.
Authors:Minghao Tu, Chun Yu, Xiyuan Shen, Zhi Zheng, Li Chen, Yuanchun Shi
Abstract:
Text boxes serve as portals to diverse functionalities in today's smartphone applications. However, when it comes to specific functionalities, users always need to navigate through multiple steps to access particular text boxes for input. We propose TextOnly, a unified function portal that enables users to access text-related functions from various applications by simply inputting text into a sole text box. For instance, entering a restaurant name could trigger a Google Maps search, while a greeting could initiate a conversation in WhatsApp. Despite their brevity, TextOnly maximizes the utilization of these raw text inputs, which contain rich information, to interpret user intentions effectively. TextOnly integrates large language models(LLM) and a BERT model. The LLM consistently provides general knowledge, while the BERT model can continuously learn user-specific preferences and enable quicker predictions. Real-world user studies demonstrated TextOnly's effectiveness with a top-1 accuracy of 71.35%, and its ability to continuously improve both its accuracy and inference speed. Participants perceived TextOnly as having satisfactory usability and expressed a preference for TextOnly over manual executions. Compared with voice assistants, TextOnly supports a greater range of text-related functions and allows for more concise inputs.
中文: TextOnly是一个基于文本的统一功能入口,通过集成大语言模型和BERT模型解析用户输入意图,实现跨应用直接功能调用,在实测中达到71.35%的准确率并持续优化,用户体验优于传统操作方式。
English: TextOnly is a unified text-based interface that uses LLM and BERT models to interpret user inputs for direct function access across applications, achieving 71.35% top-1 accuracy and continuous improvement in usability studies.
Authors:Holli Sargeant, Mackenzie Jorgensen, Arina Shah, Adrian Weller, Umang Bhatt
Abstract:
Uncertainty in artificial intelligence (AI) predictions poses urgent legal and ethical challenges for AI-assisted decision-making. We examine two algorithmic interventions that act as guardrails for human-AI collaboration: selective abstention, which withholds high-uncertainty predictions from human decision-makers, and selective friction, which delivers those predictions together with salient warnings or disclosures that slow the decision process. Research has shown that selective abstention based on uncertainty can inadvertently exacerbate disparities and disadvantage under-represented groups that disproportionately receive uncertain predictions. In this paper, we provide the first integrated socio-technical and legal analysis of uncertainty-based algorithmic interventions. Through two case studies, AI-assisted consumer credit decisions and AI-assisted content moderation, we demonstrate how the seemingly neutral use of uncertainty thresholds can trigger discriminatory impacts. We argue that, although both interventions pose risks of unlawful discrimination under UK law, selective frictions offer a promising pathway toward fairer and more accountable AI-assisted decision-making by preserving transparency and encouraging more cautious human judgment.
中文摘要:人工智能预测中的不确定性引发法律与伦理挑战,选择性弃权可能加剧歧视,而选择性摩擦通过保持透明度和促进审慎人为判断,为实现更公平的AI辅助决策提供了可行路径。
English summary: Uncertainty in AI predictions raises legal and ethical concerns, where selective abstention risks discrimination while selective friction offers a fairer approach by maintaining transparency and promoting careful human judgment.
Authors:Vladislav Li, Ilias Siniosoglou, Panagiotis Sarigiannidis, Vasileios Argyriou
Abstract:
In contemporary training for industrial manufacturing, reconciling theoretical knowledge with practical experience continues to be a significant difficulty. As companies transition to more intricate and technology-oriented settings, conventional training methods frequently inadequately equip workers with essential practical skills while maintaining safety and efficiency. Virtual Reality has emerged as a transformational instrument to tackle this issue by providing immersive, interactive, and risk-free teaching experiences. Through the simulation of authentic industrial environments, virtual reality facilitates the acquisition of vital skills for trainees within a regulated and stimulating context, therefore mitigating the hazards linked to experiential learning in the workplace. This paper presents a sophisticated VR-based industrial training architecture aimed at improving learning efficacy via high-fidelity simulations, dynamic and context-sensitive scenarios, and adaptive feedback systems. The suggested system incorporates intuitive gesture-based controls, reducing the learning curve for users across all skill levels. A new scoring metric, namely, VR Training Scenario Score (VRTSS), is used to assess trainee performance dynamically, guaranteeing ongoing engagement and incentive. The experimental assessment of the system reveals promising outcomes, with significant enhancements in information retention, task execution precision, and overall training efficacy. The results highlight the capability of VR as a crucial instrument in industrial training, providing a scalable, interactive, and efficient substitute for conventional learning methods.
中文摘要:虚拟现实通过高保真模拟和自适应反馈系统,为工业制造提供沉浸式安全培训方案,有效提升技能掌握与培训效率。
English Summary: Virtual Reality offers an immersive and safe training solution for industrial manufacturing, enhancing skill acquisition and efficiency through high-fidelity simulations and adaptive feedback systems.
Authors:Xinyu Li, Tongguang Li, Lixiang Yan, Yuheng Li, Linxuan Zhao, Mladen RakoviÄ, Inge Molenaar, Dragan GaÅ¡eviÄ, Yizhou Fan
Abstract:
SRL, defined as learners' ability to systematically plan, monitor, and regulate their learning activities, is crucial for sustained academic achievement and lifelong learning competencies. Emerging Artificial Intelligence (AI) developments profoundly influence SRL interactions by potentially either diminishing or strengthening learners' opportunities to exercise their own regulatory skills. Recent literature emphasizes a balanced approach termed Hybrid Human-AI Regulated Learning (HHAIRL), in which AI provides targeted, timely scaffolding while preserving the learners' role as active decision-makers and reflective monitors of their learning process. Nevertheless, existing digital tools frequently fall short, lacking adaptability, focusing narrowly on isolated SRL phases, and insufficiently support meaningful human-AI interactions. In response, this paper introduces the enhanced FLoRA Engine, which incorporates advanced Generative Artificial Intelligence (GenAI) features and state-of-the-art learning analytics, explicitly grounded in SRL and HHAIRL theories. The FLoRA Engine offers instrumentation tools such as collaborative writing, multi-agents chatbot, and detailed learning trace logging to support dynamic, adaptive scaffolding tailored to individual needs in real time. We further present a summary of several research studies that provide the validations for and illustrate how these instrumentation tools can be utilized in real-world educational and experimental contexts. These studies demonstrate the effectiveness of FLoRA Engine in fostering SRL and HHAIRL, providing both theoretical insights and practical solutions for the future of AI-enhanced learning context.
中文:增强版FLoRA引擎融合生成式人工智能与学习分析技术,通过动态自适应支架支持自主调节学习与人机协同教育,多项研究已验证其有效性。
English: The enhanced FLoRA Engine integrates Generative AI and learning analytics to provide adaptive, real-time scaffolding that supports self-regulated learning and human-AI collaboration in education, as validated by multiple studies.
Authors:Yueqiao Jin, Kaixun Yang, Roberto Martinez-Maldonado, Dragan GaÅ¡eviÄ, Lixiang Yan
Abstract:
Academic writing increasingly involves multimodal tasks requiring students to integrate visual information and textual arguments. While generative AI (GenAI) tools, like ChatGPT, offer new pathways for supporting academic writing, little is known about how students' GenAI literacy influences their independent multimodal writing skills or how chatbot interaction strategies (passive reactive vs. proactive scaffolding) impact learning. This study examined 79 higher education students' multimodal academic writing performance using a comparative research design. Students completed writing tasks integrating visual data under two chatbot-assisted conditions (passive vs. proactive) and subsequently without AI assistance. Their writing performance was rigorously evaluated across five dimensions, including insightfulness, visual data integration, organisation, linguistic quality, and critical thinking. Ordinal logistic regression and correlation analyses revealed that higher levels of GenAI literacy significantly predicted stronger independent multimodal writing performance immediately after AI assistance removal, particularly for students using passive chatbots requiring active prompting. These results highlight the critical role of GenAI literacy and specific chatbot interaction strategies in shaping students' capacities for independent multimodal academic writing. Our findings emphasise the need for purposeful integration of GenAI literacy training into curricula and balancing external scaffolding support with autonomous learning opportunities. This research offers valuable recommendations for educators leveraging AI-enhanced pedagogies to optimise student writing outcomes and technological engagement strategies.
中文摘要:较高的生成式AI素养显著提升学生独立多模态写作能力,尤其是在使用被动式聊天机器人时,这凸显了将AI素养培训与平衡性教学支架融入课程的必要性。
English Summary: Higher GenAI literacy significantly enhances students' independent multimodal writing performance, especially when using passive chatbots, underscoring the need for integrating GenAI literacy training and balanced scaffolding in curricula.
Authors:Runcong Zhao, Artem Bobrov, Jiazheng Li, Yulan He
Abstract:
Effective feedback is essential for student learning but is time-intensive for teachers. We present LearnLens, a modular, LLM-based system that generates personalised, curriculum-aligned feedback in science education. LearnLens comprises three components: (1) an error-aware assessment module that captures nuanced reasoning errors; (2) a curriculum-grounded generation module that uses a structured, topic-linked memory chain rather than traditional similarity-based retrieval, improving relevance and reducing noise; and (3) an educator-in-the-loop interface for customisation and oversight. LearnLens addresses key challenges in existing systems, offering scalable, high-quality feedback that empowers both teachers and students.
中文: LearnLens是一个基于大型语言模型的模块化系统,通过错误感知评估、课程基础生成和教师监督,为科学教育提供个性化、与课程一致的反馈,解决可扩展性和质量难题。
English: LearnLens is a modular, LLM-based system that generates personalized, curriculum-aligned feedback in science education through error-aware assessment, curriculum-grounded generation, and educator oversight, addressing scalability and quality challenges.
Authors:Nuo Chen, Pu Yan, Jia Li, Qixuan Zhao
Abstract:
Since its viral emergence in early 2024, Comment Robert-a Weibo-launched social chatbot-has gained widespread attention on the Chinese Internet for its unsolicited and unpredictable comments on user posts. Unlike conventional chatbots that respond only to user prompts, Robert autonomously intervenes in public discourse, representing a novel form of AI-driven social media engagement. This study examines how such autonomous, algorithmic communication reshapes human-AI interaction in everyday online contexts. Using computational linguistics techniques, including topic classification and sentiment analysis, we analyze over 3,900 user-submitted interactions from the "Robert Victims Alliance", a grassroots community documenting their exchanges with the chatbot. Topic modeling reveals six key themes: interpersonal relationships, self-identity, academic and career concerns, subcultures, sensitive topics, and social events. Complementing this, mixed-methods emotional analysis uncovers a complex affective spectrum: Robert's casual remarks can evoke warmth and humor but may also conceal covert hostility beneath neutral or polite language. These ambivalent interactions reveal an emerging emotional divide between humans and socially proactive AI, suggesting that while Robert simulates social presence, it often falls short of users' emotional needs. Our study contributes to human-AI interaction research by offering new insights into the affective dynamics and socio-technical implications of unsolicited AI bots' participation in digital public spheres.
中文:本研究通过计算语言学分析微博聊天机器人Comment Robert的主动评论行为,发现其虽能通过多样主题和复杂情感模拟社交存在,但常无法满足用户情感需求,揭示了数字公共领域中人与主动式AI间日益显现的情感鸿沟。
English: The study analyzes the Comment Robert chatbot's unsolicited interventions on Weibo, revealing through computational linguistics that while it simulates social presence through varied topics and mixed emotions, it often fails to meet users' emotional needs, highlighting an emerging human-AI affective divide in digital public spheres.
Authors:Andrea Pugnana, Giovanni De Toni, Cesare Barbera, Roberto Pellungrini, Bruno Lepri, Andrea Passerini
Abstract:
Developing decision-support systems that complement human performance in classification tasks remains an open challenge. A popular approach, Learning to Defer (LtD), allows a Machine Learning (ML) model to pass difficult cases to a human expert. However, LtD treats humans and ML models as mutually exclusive decision-makers, restricting the expert contribution to mere predictions. To address this limitation, we propose Learning to Ask (LtA), a new framework that handles both when and how to incorporate expert input in an ML model. LtA is based on a two-part architecture: a standard ML model and an enriched model trained with additional expert human feedback, with a formally optimal strategy for selecting when to query the enriched model. We provide two practical implementations of LtA: a sequential approach, which trains the models in stages, and a joint approach, which optimises them simultaneously. For the latter, we design surrogate losses with realisable-consistency guarantees. Our experiments with synthetic and real expert data demonstrate that LtA provides a more flexible and powerful foundation for effective human-AI collaboration.
Chinese: 提出的“学习询问”(LtA)框架通过确定何时及如何将专家反馈整合到机器学习模型中,增强了人机协作,并通过灵活的实现方式和提升的性能克服了现有方法的局限。
English: The proposed Learning to Ask (LtA) framework enhances human-AI collaboration by determining both when and how to integrate expert feedback into machine learning models, overcoming limitations of existing methods through flexible implementations and improved performance.
Authors:Jingjia Xiao, Qing Xiao, Hong Shen
Abstract:
Longer-running scams, such as romance fraud and "pig-butchering" scams, exploit not only victims' emotions but also the design of digital platforms. Scammers commonly leverage features such as professional-looking profile verification, algorithmic recommendations that reinforce contact, integrated payment systems, and private chat affordances to gradually establish trust and dependency with victims. Prior work in HCI and criminology has examined online scams through the lenses of detection mechanisms, threat modeling, and user-level vulnerabilities. However, less attention has been paid to how platform design itself enables longer-running scams. To address this gap, we conducted in-depth interviews with 25 longer-running scam victims in China. Our findings show how scammers strategically use platform affordances to stage credibility, orchestrate intimacy, and sustain coercion with victims. By analyzing scams as socio-technical projects, we highlight how platform design can be exploited in longer-running scams, and point to redesigning future platforms to better protect users.
Authors:Laura Koesten, Péter Ferenc Gyarmati, Connor Hogan, Bernhard Jordan, Seliem El-Sayed, Barbara Prainsack, Torsten Möller
Abstract:
We present PLUTO (Public VaLUe Assessment TOol), a framework for assessing the public value of specific instances of data use. Grounded in the concept of data solidarity, PLUTO aims to empower diverse stakeholders - including regulatory bodies, private enterprises, NGOs, and individuals - to critically engage with data projects through a structured assessment of the risks and benefits of data use, and by encouraging critical reflection. This paper discusses the theoretical foundation, development process, and initial user experiences with PLUTO. Key challenges include translating qualitative assessments of benefits and risks into actionable quantitative metrics while maintaining inclusivity and transparency. Initial feedback highlights PLUTO's potential to foster responsible decision-making and shared accountability in data practices.
Authors:Qing Hu, Qing Xiao, Hancheng Cao, Hong Shen
Abstract:
As Generative AI (GenAI) becomes increasingly embedded in the workplace, managers are beginning to create Manager Clone Agents - AI-powered digital surrogates that are trained on their work communications and decision patterns to perform managerial tasks on their behalf. To investigate this emerging phenomenon, we conducted six design fiction workshops (n = 23) with managers and workers, in which participants co-created speculative scenarios and discussed how Manager Clone Agents might transform collaborative work. We identified four potential roles that participants envisioned for Manager Clone Agents: proxy presence, informational conveyor belt, productivity engine, and leadership amplifier, while highlighting concerns spanning individual, interpersonal, and organizational levels. We provide design recommendations envisioned by both parties for integrating Manager Clone Agents responsibly into the future workplace, emphasizing the need to prioritize workers' perspectives, strengthen interpersonal bonds, and enable flexible clone configuration.
Authors:Rongyi Chen, Ziyan Xin, Qing Xiao, Ruiwei Xiao, Jingjia Xiao, Bingbing Zhang, Hong Shen, Zhicong Lu
Abstract:
The digital transformation of religious practice has reshaped how billions of people engage with spiritual content, with video-sharing platforms becoming central to contemporary religious communication. Yet HCI research lacks systematic understanding of how narrative and visual elements create meaningful spiritual experiences and foster viewer engagement. We present a mixed-methods study of religious videos on YouTube across major religions, developing taxonomies of narrative frameworks, visual elements, and viewer interaction. Using LLM-assisted analysis, we studied relationships between content characteristics and viewer responses. Religious videos predominantly adopt lecture-style formats with authority-based persuasion strategies, using salvation narratives for guidance. All prefer bright lighting, with Buddhism favoring warm tones and prominent symbols, Judaism preferring indoor settings, and Hinduism emphasizing sacred objects. We identified differentiated patterns of emotional sharing among religious viewers while revealing significant correlations between content characteristics and engagement, particularly regarding AI-generated content. We provide evidence-based guidance for creating inclusive and engaging spiritual media.
Authors:Qing Xiao, Xinlan Emily Hu, Mark E. Whiting, Arvind Karunakaran, Hong Shen, Hancheng Cao
Abstract:
When AI entered the workplace, many believed it could reshape teamwork as profoundly as it boosted individual productivity. Would AI finally ease the longstanding challenges of team collaboration? Our findings suggested a more complicated reality. We conducted a longitudinal two-wave interview study (2023-2025) with members (N=15) of a project-based software development organization to examine the expectations and use of AI in teamwork. In early 2023, just after the release of ChatGPT, participants envisioned AI as an intelligent coordinator that could align projects, track progress, and ease interpersonal frictions. By 2025, however, AI was used mainly to accelerate individual tasks such as coding, writing, and documentation, leaving persistent collaboration issues of performance accountability and fragile communication unresolved. Yet AI reshaped collaborative culture: efficiency became a norm, transparency and responsible use became markers of professionalism, and AI was increasingly accepted as part of teamwork.
Authors:Qing Xiao, Qing Hu, Jingjia Xiao, Hancheng Cao, Hong Shen
Abstract:
Generative AI (GenAI) is reshaping work, but adoption remains largely individual and experimental rather than integrated into collaborative routines. Whether GenAI can move from individual use to collaborative work is a critical question for future organizations. Journalism offers a compelling site to examine this shift: individual journalists have already been disrupted by GenAI tools; yet newswork is inherently collaborative relying on shared routines and coordinated workflows. We conducted 27 interviews with newsrooms managers, editors, and front-line journalists in China. We found that journalists frequently used GenAI to support daily tasks, but value alignment was safeguarded mainly through individual discretion. At the organizational level, GenAI use remained disconnected from team workflows, hindered by structural barriers and cultural reluctance to share practices. These findings underscore the gap between individual and collective adoption, pointing to the need for accounting for organizational structures, cultural norms, and workflow integration when designing GenAI for collaborative work.
Authors:Yike Shi, Qing Xiao, Qing Hu, Hong Shen, Hua Shen
Abstract:
Large language models can influence users through conversation, creating new forms of dark patterns that differ from traditional UX dark patterns. We define LLM dark patterns as manipulative or deceptive behaviors enacted in dialogue. Drawing on prior work and AI incident reports, we outline a diverse set of categories with real-world examples. Using them, we conducted a scenario-based study where participants (N=34) compared manipulative and neutral LLM responses. Our results reveal that recognition of LLM dark patterns often hinged on conversational cues such as exaggerated agreement, biased framing, or privacy intrusions, but these behaviors were also sometimes normalized as ordinary assistance. Users' perceptions of these dark patterns shaped how they respond to them. Responsibilities for these behaviors were also attributed in different ways, with participants assigning it to companies and developers, the model itself, or to users. We conclude with implications for design, advocacy, and governance to safeguard user autonomy.
Authors:Ruiwei Xiao, Qing Xiao, Xinying Hou, Phenyo Phemelo Moletsane, Hanqi Jane Li, Hong Shen, John Stamper
Abstract:
Generative artificial intelligence (GenAI) is rapidly entering K-12 classrooms worldwide, initiating urgent debates about its potential to either reduce or exacerbate educational inequalities. Drawing on interviews with 30 K-12 teachers across the United States, South Africa, and Taiwan, this study examines how teachers navigate this GenAI tension around educational equalities. We found teachers actively framed GenAI education as an equality-oriented practice: they used it to alleviate pre-existing inequalities while simultaneously working to prevent new inequalities from emerging. Despite these efforts, teachers confronted persistent systemic barriers, i.e., unequal infrastructure, insufficient professional training, and restrictive social norms, that individual initiative alone could not overcome. Teachers thus articulated normative visions for more inclusive GenAI education. By centering teachers' practices, constraints, and future envisions, this study contributes a global account of how GenAI education is being integrated into K-12 contexts and highlights what is required to make its adoption genuinely equal.
Authors:Ruiwei Xiao, Qing Xiao, Xinying Hou, Hanqi Jane Li, Phenyo Phemelo Moletsane, Hong Shen, John Stamper
Abstract:
Generative AI (GenAI) is rapidly entering K-12 classrooms, offering teachers new ways for teaching practices. Yet GenAI models are often trained on culturally uneven datasets, embedding a "default culture" that often misaligns with local classrooms. To understand how teachers navigate this gap, we defined the new concept Cultural Distance (the gap between GenAI's default cultural repertoire and the situated demands of teaching practice) and conducted in-depth interviews with 30 K-12 teachers, 10 each from South Africa, Taiwan, and the United States, who had integrated AI into their teaching practice. These teachers' experiences informed the development of our three-level cultural distance framework. This work contributes the concept and framework of cultural distance, six illustrative instances spanning in low, mid, high distance levels with teachers' experiences and strategies for addressing them. Empirically, we offer implications to help AI designers, policymakers, and educators create more equitable and culturally responsive GenAI tools for education.
Authors:Péter Ferenc Gyarmati, Manfred Klaffenböck, Laura Koesten, Torsten Möller
Abstract:
Vision-language models (VLMs) hold promise for enhancing visualization tools, but effective human-AI collaboration hinges on a shared perceptual understanding of visual content. Prior studies assessed VLM visualization literacy through interpretive tasks, revealing an over-reliance on textual cues rather than genuine visual analysis. Our study investigates a more foundational skill underpinning such literacy: the ability of VLMs to recognize a chart's core visual properties as humans do. We task 13 diverse VLMs with classifying scientific visualizations based solely on visual stimuli, according to three criteria: purpose (e.g., schematic, GUI, visualization), encoding (e.g., bar, point, node-link), and dimensionality (e.g., 2D, 3D). Using expert labels from the human-centric VisType typology as ground truth, we find that VLMs often identify purpose and dimensionality accurately but struggle with specific encoding types. Our preliminary results show that larger models do not always equate to superior performance and highlight the need for careful integration of VLMs in visualization tasks, with human supervision to ensure reliable outcomes.
Authors:Yi Wang, Chetan Arora, Xiao Liu, Thuong Hoang, ZHengxin Zhang, Henry Been Lirn Duh, John Grundy
Abstract:
Accessibility reviews provide valuable insights into both the limitations and benefits experienced by users with disabilities when using virtual reality (VR) applications. However, a comprehensive investigation into VR accessibility for users with disabilities is still lacking. To fill this gap, this study analyzes user reviews from the Meta and Steam stores of VR apps, focusing on the reported issues affecting users with disabilities. We applied selection criteria to 1,367,419 reviews from the top 40, the 20 most popular, and the 40 lowest-rated VR applications on both platforms. In total, 1,076 (0.078%) VR accessibility reviews referenced various disabilities across 100 VR applications. These applications were categorized into Action, Sports, Social, Puzzle, Horror, and Simulation, with Action receiving the highest number of accessibility related-reviews. We identified 16 different types of disabilities across six categories. Furthermore, we examined the causes of accessibility issues as reported by users with disabilities. Overall, VR accessibility reviews were predominantly under-supported.
Authors:Lynnette Hui Xian Ng, Divyaansh Sinha, Kathleen M. Carley
Abstract:
Social network motifs are recurring patterns of small subgraphs that indicate fundamental patterns of social communication. In this work, we study the simple star network motifs that recur on X during the COVID-19 discourse. We study the profile of the manifestation of the star network among bot and human users. There are six primary patterns of the star motif, differentiating by the bots and humans being either egos and alters. We describe the presentation of each of these six patterns in our data, demonstrating how the motif patterns can inform social media behavioral analysis.
中文摘要:本研究分析了X平台上COVID-19讨论中出现的六种星型网络模式,通过区分机器人与人类用户作为中心节点或边缘节点的行为差异,为社交媒体行为分析提供了新见解。
English Summary: This study analyzes six recurring star network motifs on X during COVID-19 discussions, distinguishing behavioral patterns between bot and human users as central or peripheral actors to enhance social media behavior analysis.
Authors:Yue Wu, Xiaolan Chen, Weiyi Zhang, Shunming Liu, Wing Man Rita Sum, Xinyuan Wu, Xianwen Shang, Chea-su Kee, Mingguang He, Danli Shi
Abstract:
Large language models (LLMs) show promise for tailored healthcare communication but face challenges in interpretability and multi-task integration particularly for domain-specific needs like myopia, and their real-world effectiveness as patient education tools has yet to be demonstrated. Here, we introduce ChatMyopia, an LLM-based AI agent designed to address text and image-based inquiries related to myopia. To achieve this, ChatMyopia integrates an image classification tool and a retrieval-augmented knowledge base built from literature, expert consensus, and clinical guidelines. Myopic maculopathy grading task, single question examination and human evaluations validated its ability to deliver personalized, accurate, and safe responses to myopia-related inquiries with high scalability and interpretability. In a randomized controlled trial (n=70, NCT06607822), ChatMyopia significantly improved patient satisfaction compared to traditional leaflets, enhancing patient education in accuracy, empathy, disease awareness, and patient-eyecare practitioner communication. These findings highlight ChatMyopia's potential as a valuable supplement to enhance patient education and improve satisfaction with medical services in primary eye care settings.
Chinese: ChatMyopia作为基于大语言模型的AI助手,通过整合图像分类工具和知识库,在近视相关咨询中提供个性化精准回复,显著提升了患者教育效果和眼科护理满意度,优于传统宣教方式。
English: ChatMyopia, an AI agent using large language models, effectively enhances patient education by providing personalized and accurate responses to myopia-related inquiries, significantly improving satisfaction and communication in eye care compared to traditional methods.
Authors:Jens V. Rüppel, Andrey Rudenko, Tim Schreiter, Martin Magnusson, Achim J. Lilienthal
Abstract:
The rapid development of Large Language Models (LLMs) creates an exciting potential for flexible, general knowledge-driven Human-Robot Interaction (HRI) systems for assistive robots. Existing HRI systems demonstrate great progress in interpreting and following user instructions, action generation, and robot task solving. On the other hand, bi-directional, multi-modal, and context-aware support of the user in collaborative tasks still remains an open challenge. In this paper, we present a gaze- and speech-informed interface to the assistive robot, which is able to perceive the working environment from multiple vision inputs and support the dynamic user in their tasks. Our system is designed to be modular and transferable to adapt to diverse tasks and robots, and it is capable of real-time use of language-based interaction state representation and fast on board perception modules. Its development was supported by multiple public dissemination events, contributing important considerations for improved robustness and user experience. Furthermore, in two lab studies, we compare the performance and user ratings of our system with those of a traditional scripted HRI pipeline. Our findings indicate that an LLM-based approach enhances adaptability and marginally improves user engagement and task execution metrics but may produce redundant output, while a scripted pipeline is well suited for more straightforward tasks.
中文: 本文提出了一种基于大型语言模型的模块化实时辅助机器人系统,通过多模态交互提升了人机协作的适应性和用户参与度,但相比传统脚本方法可能产生冗余输出。
English: This paper introduces a modular, real-time assistive robot system using Large Language Models to enhance adaptability and user engagement in Human-Robot Interaction, though it may generate some redundant outputs compared to traditional scripted approaches.
Authors:Shivam Chaubey, Francesco Verdoja, Shankar Deka, Ville Kyrki
Abstract:
Shared control combines human intention with autonomous decision-making, from low-level safety overrides to high-level task guidance, enabling systems that adapt to users while ensuring safety and performance. This enhances task effectiveness and user experience across domains such as assistive robotics, teleoperation, and autonomous driving. However, existing shared control methods, based on e.g. Model Predictive Control, Control Barrier Functions, or learning-based control, struggle with feasibility, scalability, or safety guarantees, particularly since the user input is unpredictable.
To address these challenges, we propose an assistive controller framework based on Constrained Optimal Control Problem that incorporates an offline-computed Control Invariant Set, enabling online computation of control actions that ensure feasibility, strict constraint satisfaction, and minimal override of user intent. Moreover, the framework can accommodate structured class of non-convex constraints, which are common in real-world scenarios. We validate the approach through a large-scale user study with 66 participants--one of the most extensive in shared control research--using a computer game environment to assess task load, trust, and perceived control, in addition to performance. The results show consistent improvements across all these aspects without compromising safety and user intent.
中文摘要:本研究提出的辅助控制框架通过离线计算控制不变集,解决了现有共享控制方法在可行性、安全性和用户意图覆盖方面的不足,大规模用户实验证明该框架在保证安全的同时显著提升了任务性能、用户信任度和控制感知。
English Summary: The proposed assistive controller framework overcomes limitations of existing shared control methods by ensuring safety, feasibility, and minimal user intent override through an offline-computed Control Invariant Set, with large-scale user studies confirming improved performance, trust, and perceived control.
Authors:LearnLM Team, Google, :, Alicia MartÃn, Amir Globerson, Amy Wang, Anirudh Shekhawat, Anna Iurchenko, Anisha Choudhury, Avinatan Hassidim, Ayça Ãakmakli, Ayelet Shasha Evron, Charlie Yang, Courtney Heldreth, Diana Akrong, Gal Elidan, Hairong Mu, Ian Li, Ido Cohen, Katherine Chou, Komal Singh, Lev Borovoi, Lidan Hackmon, Lior Belinsky, Michael Fink, Niv Efron, Preeti Singh, Rena Levitt, Shashank Agarwal, Shay Sharon, Tracey Lee-Joe, Xiaohong Hao, Yael Gold-Zamir, Yael Haramaty, Yishay Mor, Yoav Bar Sinai, Yossi Matias
Abstract:
Textbooks are a cornerstone of education, but they have a fundamental limitation: they are a one-size-fits-all medium. Any new material or alternative representation requires arduous human effort, so that textbooks cannot be adapted in a scalable manner. We present an approach for transforming and augmenting textbooks using generative AI, adding layers of multiple representations and personalization while maintaining content integrity and quality. We refer to the system built with this approach as Learn Your Way. We report pedagogical evaluations of the different transformations and augmentations, and present the results of a a randomized control trial, highlighting the advantages of learning with Learn Your Way over regular textbook usage.
Authors:Naimeng Ye, Xiao Yu, Ruize Xu, Tianyi Peng, Zhou Yu
Abstract:
Automated web testing plays a critical role in ensuring high-quality user experiences and delivering business value. Traditional approaches primarily focus on code coverage and load testing, but often fall short of capturing complex user behaviors, leaving many usability issues undetected. The emergence of large language models (LLM) and AI agents opens new possibilities for web testing by enabling human-like interaction with websites and a general awareness of common usability problems. In this work, we present WebProber, a prototype AI agent-based web testing framework. Given a URL, WebProber autonomously explores the website, simulating real user interactions, identifying bugs and usability issues, and producing a human-readable report. We evaluate WebProber through a case study of 120 academic personal websites, where it uncovered 29 usability issues--many of which were missed by traditional tools. Our findings highlight agent-based testing as a promising direction while outlining directions for developing next-generation, user-centered testing frameworks.
Authors:Runtong Wu, Jiayao Song, Fei Teng, Xianhao Ren, Yuyan Gao, Kailun Yang
Abstract:
Narrative inquiry has been one of the prominent application domains for the analysis of human experience, aiming to know more about the complexity of human society. However, researchers are often required to transform various forms of data into coherent hand-drafted narratives in storied form throughout narrative analysis, which brings an immense burden of data analysis. Participants, too, are expected to engage in member checking and presentation of these narrative products, which involves reviewing and responding to large volumes of documents. Given the dual burden and the need for more efficient and participant-friendly approaches to narrative making and representation, we made a first attempt: (i) a new paradigm is proposed, NAME, as the initial attempt to push the field of narrative inquiry. Name is able to transfer research documents into coherent story images, alleviating the cognitive burden of interpreting extensive text-based materials during member checking for both researchers and participants. (ii) We develop an actor location and shape module to facilitate plausible image generation. (iii) We have designed a set of robust evaluation metrics comprising three key dimensions to objectively measure the perceptual quality and narrative consistency of generated characters. Our approach consistently demonstrates state-of-the-art performance across different data partitioning schemes. Remarkably, while the baseline relies on the full 100% of the available data, our method requires only 0.96% yet still reduces the FID score from 195 to 152. Under identical data volumes, our method delivers substantial improvements: for the 70:30 split, the FID score decreases from 175 to 152, and for the 95:5 split, it is nearly halved from 96 to 49. Furthermore, the proposed model achieves a score of 3.62 on the newly introduced metric, surpassing the baseline score of 2.66.
Authors:Santiago Berrezueta-Guzman, Refia Daya, Stefan Wagner
Abstract:
Sign language (SL) is an essential mode of communication for Deaf and Hard-of-Hearing (DHH) individuals. Its education remains limited by the lack of qualified instructors, insufficient early exposure, and the inadequacy of traditional teaching methods. Recent advances in Virtual Reality (VR) and Artificial Intelligence (AI) offer promising new approaches to enhance sign language learning through immersive, interactive, and feedback-rich environments. This paper presents a systematic review of 55 peer-reviewed studies on VR-based sign language education, identifying and analyzing five core thematic areas: (1) gesture recognition and real-time feedback mechanisms; (2) interactive VR environments for communicative practice; (3) gamification for immersive and motivating learning experiences; (4) personalized and adaptive learning systems; and (5) accessibility and inclusivity for diverse DHH learners. The results reveal that AI-driven gesture recognition systems integrated with VR can provide real-time feedback, significantly improving learner engagement and performance. However, the analysis highlights critical challenges: hardware limitations, inconsistent accuracy in gesture recognition, and a lack of inclusive and adaptive design. This review contributes a comprehensive synthesis of technological and pedagogical innovations in the field, outlining current limitations and proposing actionable recommendations for developers and researchers. By bridging technical advancement with inclusive pedagogy, this review lays the foundation for next-generation VR systems that are equitable, effective, and accessible for sign language learners worldwide.
中文: 本系统综述探讨了虚拟现实与人工智能技术如何通过沉浸式环境和实时反馈改进手语教学,同时指出硬件限制与识别精度等亟待解决的挑战,以推动全球公平应用。
English: This systematic review examines how VR and AI technologies enhance sign language education through immersive environments and real-time feedback, while identifying challenges like hardware limitations and recognition inaccuracies that need addressing for equitable global implementation.
Authors:Santiago Berrezueta-Guzman, Stefan Wagner
Abstract:
Virtual reality (VR) development relies on game engines to provide real-time rendering, physics simulation, and interaction systems. Among the most widely used game engines, Unreal Engine and Unity dominate the industry, offering distinct advantages in graphics rendering, performance optimization, usability, resource requirements, and scalability. This study presents a comprehensive comparative analysis of both engines, evaluating their capabilities and trade-offs through empirical assessments and real-world case studies of large-scale VR projects. The findings highlight key factors such as rendering fidelity, computational efficiency, cross-platform compatibility, and development workflows. These provide practical insights for selecting the most suitable engine based on project-specific needs. Furthermore, emerging trends in artificial intelligence (AI)-driven enhancements, including Deep Learning Super Sampling (DLSS) and large language models (LLMs), are explored to assess their impact on VR development workflows. By aligning engine capabilities with technical and creative requirements, developers can overcome performance bottlenecks, enhance immersion, and streamline optimization techniques.
This study serves as a valuable resource for VR developers, researchers, and industry professionals, offering data-driven recommendations to navigate the evolving landscape of VR technology.
Authors:Esra Mehmedova, Santiago Berrezueta-Guzman, Stefan Wagner
Abstract:
The Apple Vision Pro is equipped with accurate eye-tracking capabilities, yet the privacy restrictions on the device prevent direct access to continuous user gaze data. This study introduces iTrace, a novel application that overcomes these limitations through click-based gaze extraction techniques, including manual methods like a pinch gesture, and automatic approaches utilizing dwell control or a gaming controller. We developed a system with a client-server architecture that captures the gaze coordinates and transforms them into dynamic heatmaps for video and spatial eye tracking. The system can generate individual and averaged heatmaps, enabling analysis of personal and collective attention patterns.
To demonstrate its effectiveness and evaluate the usability and performance, a study was conducted with two groups of 10 participants, each testing different clicking methods. The 8BitDo controller achieved higher average data collection rates at 14.22 clicks/s compared to 0.45 clicks/s with dwell control, enabling significantly denser heatmap visualizations. The resulting heatmaps reveal distinct attention patterns, including concentrated focus in lecture videos and broader scanning during problem-solving tasks. By allowing dynamic attention visualization while maintaining a high gaze precision of 91 %, iTrace demonstrates strong potential for a wide range of applications in educational content engagement, environmental design evaluation, marketing analysis, and clinical cognitive assessment. Despite the current gaze data restrictions on the Apple Vision Pro, we encourage developers to use iTrace only in research settings.
Authors:Xiao Fu, Hossein A. Rahmani, Bin Wu, Jerome Ramos, Emine Yilmaz, Aldo Lipani
Abstract:
Personalised text generation is essential for user-centric information systems, yet most evaluation methods overlook the individuality of users. We introduce \textbf{PREF}, a \textbf{P}ersonalised \textbf{R}eference-free \textbf{E}valuation \textbf{F}ramework that jointly measures general output quality and user-specific alignment without requiring gold personalised references. PREF operates in a three-step pipeline: (1) a coverage stage uses a large language model (LLM) to generate a comprehensive, query-specific guideline covering universal criteria such as factuality, coherence, and completeness; (2) a preference stage re-ranks and selectively augments these factors using the target user's profile, stated or inferred preferences, and context, producing a personalised evaluation rubric; and (3) a scoring stage applies an LLM judge to rate candidate answers against this rubric, ensuring baseline adequacy while capturing subjective priorities. This separation of coverage from preference improves robustness, transparency, and reusability, and allows smaller models to approximate the personalised quality of larger ones. Experiments on the PrefEval benchmark, including implicit preference-following tasks, show that PREF achieves higher accuracy, better calibration, and closer alignment with human judgments than strong baselines. By enabling scalable, interpretable, and user-aligned evaluation, PREF lays the groundwork for more reliable assessment and development of personalised language generation systems.
中文: PREF是一种无需参考标准的新型评估框架,通过三步流程综合衡量文本的通用质量和用户个性化匹配度,在准确性和与人类判断一致性上优于现有基准。
English: PREF is a novel reference-free evaluation framework that assesses both general text quality and user-specific alignment through a three-step pipeline, demonstrating superior accuracy and human judgment correlation compared to existing methods.
Authors:Esin Mehmedova, Santiago Berrezueta-Guzman, Stefan Wagner
Abstract:
Designing effective user interfaces (UIs) for virtual reality (VR) is essential to enhance user immersion, usability, comfort, and accessibility in virtual environments. Despite the growing adoption of VR across domains such as education, healthcare, gaming, and rehabilitation, there is a noticeable lack of unified and comprehensive design guidelines for VR UI design. To address this gap, we conducted a systematic literature review to identify existing best practices and propose complete and unified guidelines for UI development in VR.
Building on these insights, this research proposes a set of best practices to guide the creation of more effective VR interfaces. To demonstrate and validate these practices, we developed a VR application called \textit{FlUId} that showcases both good and bad UI design principles for direct comparison. A user study was conducted to evaluate the impact of the proposed guidelines. The findings aim to bridge the gap between theory and practice, offering concrete recommendations for VR designers and developers.
Authors:Süeda Ãzkaya, Santiago Berrezueta-Guzman, Stefan Wagner
Abstract:
The integration of Large Language Models (LLMs) into Virtual Reality (VR) games marks a paradigm shift in the design of immersive, adaptive, and intelligent digital experiences. This paper presents a comprehensive review of recent research at the intersection of LLMs and VR, examining how these models are transforming narrative generation, non-player character (NPC) interactions, accessibility, personalization, and game mastering. Drawing from an analysis of 62 peer reviewed studies published between 2018 and 2025, we identify key application domains ranging from emotionally intelligent NPCs and procedurally generated storytelling to AI-driven adaptive systems and inclusive gameplay interfaces. We also address the major challenges facing this convergence, including real-time performance constraints, memory limitations, ethical risks, and scalability barriers. Our findings highlight that while LLMs significantly enhance realism, creativity, and user engagement in VR environments, their effective deployment requires robust design strategies that integrate multimodal interaction, hybrid AI architectures, and ethical safeguards. The paper concludes by outlining future research directions in multimodal AI, affective computing, reinforcement learning, and open-source development, aiming to guide the responsible advancement of intelligent and inclusive VR systems.
Authors:Kohou Wang, ZhaoXiang Liu, Lin Bai, Kun Fan, Xiang Liu, Huan Hu, Kai Wang, Shiguo Lian
Abstract:
It is crucial that robots' performance can be improved after deployment, as they are inherently likely to encounter novel scenarios never seen before. This paper presents an innovative solution: an interactive learning-based robot system powered by a Multi-modal Large Language Model(MLLM). A key feature of our system is its ability to learn from natural dialogues with non-expert users. We also propose chain of question to clarify the exact intent of the question before providing an answer and dual-modality retrieval modules to leverage these interaction events to avoid repeating same mistakes, ensuring a seamless user experience before model updates, which is in contrast to current mainstream MLLM-based robotic systems. Our system marks a novel approach in robotics by integrating interactive learning, paving the way for superior adaptability and performance in diverse environments. We demonstrate the effectiveness and improvement of our method through experiments, both quantitively and qualitatively.
中文摘要:本文提出了一种基于多模态大语言模型的交互式学习机器人系统,通过与普通用户的自然对话进行学习,采用问题链澄清机制和双模态检索模块,有效提升机器人在不同环境中的适应性和性能表现。
English Summary: This paper introduces an interactive learning-based robot system using a Multi-modal Large Language Model that learns from natural dialogues with non-expert users, enhancing adaptability and performance through innovative features like chain of questioning and dual-modality retrieval.
Authors:Santiago Berrezueta-Guzman, WenChun Chen, Stefan Wagner
Abstract:
Virtual Reality (VR) offers promising avenues for innovative therapeutic interventions in populations with intellectual disabilities (ID). This paper presents the design, development, and evaluation of Space Exodus, a novel VR-based role-playing game specifically tailored for children with ID. By integrating immersive gameplay with therapeutic task design, Space Exodus aims to enhance concentration, cognitive processing, and fine motor skills through structured hand-eye coordination exercises. A six-week pre-test/post-test study was conducted with 16 children in Ecuador, using standardized assessments, the Toulouse-Pieron Cancellation Test, and the Moss Attention Rating Scale complemented by detailed observational metrics. Quantitative results indicate statistically significant improvements in concentration scores, with test scores increasing from 65.2 to 80.3 and 55.4 to 68.7, respectively (p < 0.01). Qualitative observations revealed reduced task attempts, enhanced user confidence, and increased active participation. The inclusion of a VR assistant provided consistent guidance that further boosted engagement. These findings demonstrate the potential of immersive, game-based learning environments as practical therapeutic tools, laying a robust foundation for developing inclusive and adaptive rehabilitation strategies for children with ID.
中文: 本文介绍了专为智障儿童设计的虚拟现实角色扮演游戏《太空出埃及记》,通过六周研究表明该游戏显著提升了儿童的专注力和认知能力,验证了沉浸式游戏作为有效治疗工具的潜力。
English: This paper introduces Space Exodus, a VR role-playing game designed for children with intellectual disabilities, demonstrating through a six-week study significant improvements in concentration and cognitive skills, thereby supporting the use of immersive games as effective therapeutic tools.
Authors:Ding Wang, Mark DÃaz, Charvi Rastogi, Aida Davani, Vinodkumar Prabhakaran, Pushkar Mishra, Roma Patel, Alicia Parrish, Zoe Ashwood, Michela Paganini, Tian Huey Teh, Verena Rieser, Lora Aroyo
Abstract:
Understanding what constitutes safety in AI-generated content is complex. While developers often rely on predefined taxonomies, real-world safety judgments also involve personal, social, and cultural perceptions of harm. This paper examines how annotators evaluate the safety of AI-generated images, focusing on the qualitative reasoning behind their judgments. Analyzing 5,372 open-ended comments, we find that annotators consistently invoke moral, emotional, and contextual reasoning that extends beyond structured safety categories. Many reflect on potential harm to others more than to themselves, grounding their judgments in lived experience, collective risk, and sociocultural awareness. Beyond individual perceptions, we also find that the structure of the task itself -- including annotation guidelines -- shapes how annotators interpret and express harm. Guidelines influence not only which images are flagged, but also the moral judgment behind the justifications. Annotators frequently cite factors such as image quality, visual distortion, and mismatches between prompt and output as contributing to perceived harm dimensions, which are often overlooked in standard evaluation frameworks. Our findings reveal that existing safety pipelines miss critical forms of reasoning that annotators bring to the task. We argue for evaluation designs that scaffold moral reflection, differentiate types of harm, and make space for subjective, context-sensitive interpretations of AI-generated content.
中文: 研究发现,标注者通过道德、情感和情境推理评估AI生成图像的安全性,这些判断常基于生活经验和社会文化认知,而现有安全评估框架却忽视了这些关键维度,任务结构和指导方针也显著影响伤害认定的方式。
English: This study reveals that human annotators assess the safety of AI-generated images through moral, emotional, and contextual reasoning beyond predefined categories, highlighting how task structures and guidelines shape harm interpretations while exposing gaps in current evaluation frameworks.
Authors:Ye Tian, Xiaoyuan Ren, Zihao Wang, Onat Gungor, Xiaofan Yu, Tajana Rosing
Abstract:
Rich and context-aware activity logs facilitate user behavior analysis and health monitoring, making them a key research focus in ubiquitous computing. The remarkable semantic understanding and generation capabilities of Large Language Models (LLMs) have recently created new opportunities for activity log generation. However, existing methods continue to exhibit notable limitations in terms of accuracy, efficiency, and semantic richness. To address these challenges, we propose DailyLLM. To the best of our knowledge, this is the first log generation and summarization system that comprehensively integrates contextual activity information across four dimensions: location, motion, environment, and physiology, using only sensors commonly available on smartphones and smartwatches. To achieve this, DailyLLM introduces a lightweight LLM-based framework that integrates structured prompting with efficient feature extraction to enable high-level activity understanding. Extensive experiments demonstrate that DailyLLM outperforms state-of-the-art (SOTA) log generation methods and can be efficiently deployed on personal computers and Raspberry Pi. Utilizing only a 1.5B-parameter LLM model, DailyLLM achieves a 17% improvement in log generation BERTScore precision compared to the 70B-parameter SOTA baseline, while delivering nearly 10x faster inference speed.
中文摘要:DailyLLM提出了一种轻量级框架,利用智能手机和智能手表的传感器,通过整合位置、运动、环境和生理四个维度的上下文数据来生成活动日志,在BERTScore精度上相比现有方法提升17%,推理速度提高近10倍。
English Summary: DailyLLM introduces a lightweight framework using smartphone and smartwatch sensors to generate activity logs by integrating contextual data across four dimensions, achieving superior accuracy and efficiency over existing methods with a 17% BERTScore improvement and 10x faster inference speed.
Authors:Kyungtae Han, Yitao Chen, Rohit Gupta, Onur Altintas
Abstract:
While autonomous driving technologies continue to advance, current Advanced Driver Assistance Systems (ADAS) remain limited in their ability to interpret scene context or engage with drivers through natural language. These systems typically rely on predefined logic and lack support for dialogue-based interaction, making them inflexible in dynamic environments or when adapting to driver intent. This paper presents Scene-Aware Conversational ADAS (SC-ADAS), a modular framework that integrates Generative AI components including large language models, vision-to-text interpretation, and structured function calling to enable real-time, interpretable, and adaptive driver assistance. SC-ADAS supports multi-turn dialogue grounded in visual and sensor context, allowing natural language recommendations and driver-confirmed ADAS control. Implemented in the CARLA simulator with cloud-based Generative AI, the system executes confirmed user intents as structured ADAS commands without requiring model fine-tuning. We evaluate SC-ADAS across scene-aware, conversational, and revisited multi-turn interactions, highlighting trade-offs such as increased latency from vision-based context retrieval and token growth from accumulated dialogue history. These results demonstrate the feasibility of combining conversational reasoning, scene perception, and modular ADAS control to support the next generation of intelligent driver assistance.
中文: 本文提出SC-ADAS框架,通过生成式AI实现基于场景感知和自然语言对话的实时可解释驾驶辅助,但面临视觉上下文检索延迟和对话历史增长等挑战。
English: This paper introduces SC-ADAS, a modular framework that leverages Generative AI to enable real-time, interpretable driver assistance through natural language dialogue and scene-aware interactions, though it faces challenges like increased latency and token growth.
Authors:Oleksandra Sobchyshak, Santiago Berrezueta-Guzman, Stefan Wagner
Abstract:
Unreal Engine is a platform that has influenced immersive storytelling and virtual reality (VR) through its advanced features and diverse applications. This paper provides an in-depth technical review of Unreal Engine. It analyzes its key innovations in creating hyper-realistic environments and emotionally engaging narratives, with significant applications in gaming, virtual production, education, cultural preservation, and healthcare. The findings of this article highlight Unreal Engine's transformative impact across industries, demonstrating its ability to merge storytelling with cutting-edge technologies. Case studies illustrate how Unreal Engine facilitates seamless visuals, audio, and interactivity integration to create compelling experiences. Additionally, this study identifies Unreal Engine's versatility in applications ranging from procedural content generation and AI-driven workflows to smart city simulations and VR-based rehabilitation programs.
While Unreal Engine sets new benchmarks for visual fidelity and interactivity, this paper underscores critical challenges, including its high hardware demands, limited accessibility, and ethical concerns related to over-immersion and data privacy. Addressing these challenges through cloud-based rendering, inclusive design, and ethical practices is essential for broader adoption and sustainability. This review concludes that Unreal Engine is suitable for innovation and interdisciplinary collaboration. Its ability to empower creators, redefine workflows, and push the boundaries of immersive storytelling positions Unreal Engine as pivotal in shaping the future of virtual reality and interactive media.
虚幻引擎通过其超逼真环境和跨行业应用革新了沉浸式叙事与虚拟现实,但也面临硬件要求高和伦理问题等挑战,需通过云端渲染与包容性设计推动更广泛采用。
Unreal Engine revolutionizes immersive storytelling and VR with its hyper-realistic environments and versatile applications across industries, though it faces challenges like high hardware demands and ethical concerns that require solutions for wider adoption.
Authors:Mert İnan, Anthony Sicilia, Alex Xie, Saujas Vaduguru, Daniel Fried, Malihe Alikhani
Abstract:
Establishing shared goals is a fundamental step in human-AI communication. However, ambiguities can lead to outputs that seem correct but fail to reflect the speaker's intent. In this paper, we explore this issue with a focus on the data visualization domain, where ambiguities in natural language impact the generation of code that visualizes data. The availability of multiple views on the contextual (e.g., the intended plot and the code rendering the plot) allows for a unique and comprehensive analysis of diverse ambiguity types. We develop a taxonomy of types of ambiguity that arise in this task and propose metrics to quantify them. Using Matplotlib problems from the DS-1000 dataset, we demonstrate that our ambiguity metrics better correlate with human annotations than uncertainty baselines. Our work also explores how multi-turn dialogue can reduce ambiguity, therefore, improve code accuracy by better matching user goals. We evaluate three pragmatic models to inform our dialogue strategies: Gricean Cooperativity, Discourse Representation Theory, and Questions under Discussion. A simulated user study reveals how pragmatic dialogues reduce ambiguity and enhance code accuracy, highlighting the value of multi-turn exchanges in code generation.
Authors:Tzu-Sheng Kuo, Sophia Liu, Quan Ze Chen, Joseph Seering, Amy X. Zhang, Haiyi Zhu, Kenneth Holstein
Abstract:
AI agents, or bots, serve important roles in online communities. However, they are often designed by outsiders or a few tech-savvy members, leading to bots that may not align with the broader community's needs. How might communities collectively shape the behavior of community bots? We present Botender, a system that enables communities to collaboratively design LLM-powered bots without coding. With Botender, community members can directly propose, iterate on, and deploy custom bot behaviors tailored to community needs. Botender facilitates testing and iteration on bot behavior through case-based provocations: interaction scenarios generated to spark user reflection and discussion around desirable bot behavior. A validation study found these provocations more useful than standard test cases for revealing improvement opportunities and surfacing disagreements. During a five-day deployment across six Discord servers, Botender supported communities in tailoring bot behavior to their specific needs, showcasing the usefulness of case-based provocations in facilitating collaborative bot design.
Authors:Lujain Ibrahim, Katherine M. Collins, Sunnie S. Y. Kim, Anka Reuel, Max Lamparth, Kevin Feng, Lama Ahmad, Prajna Soni, Alia El Kattan, Merlin Stein, Siddharth Swaroop, Ilia Sucholutsky, Andrew Strait, Q. Vera Liao, Umang Bhatt
Abstract:
Large language models (LLMs) distinguish themselves from previous technologies by functioning as collaborative "thought partners," capable of engaging more fluidly in natural language. As LLMs increasingly influence consequential decisions across diverse domains from healthcare to personal advice, the risk of overreliance - relying on LLMs beyond their capabilities - grows. This position paper argues that measuring and mitigating overreliance must become central to LLM research and deployment. First, we consolidate risks from overreliance at both the individual and societal levels, including high-stakes errors, governance challenges, and cognitive deskilling. Then, we explore LLM characteristics, system design features, and user cognitive biases that - together - raise serious and unique concerns about overreliance in practice. We also examine historical approaches for measuring overreliance, identifying three important gaps and proposing three promising directions to improve measurement. Finally, we propose mitigation strategies that the AI research community can pursue to ensure LLMs augment rather than undermine human capabilities.
Authors:Xinye Cao, Hongcan Guo, Guoshun Nan, Jiaoyang Cui, Haoting Qian, Yihan Lin, Yilin Peng, Diyang Zhang, Yanzhao Hou, Huici Wu, Xiaofeng Tao, Tony Q. S. Quek
Abstract:
Interactive multimodal applications (IMAs), such as route planning in the Internet of Vehicles, enrich users' personalized experiences by integrating various forms of data over wireless networks. Recent advances in large language models (LLMs) utilize mixture-of-experts (MoE) mechanisms to empower multiple IMAs, with each LLM trained individually for a specific task that presents different business workflows. In contrast to existing approaches that rely on multiple LLMs for IMAs, this paper presents a novel paradigm that accomplishes various IMAs using a single compositional LLM over wireless networks. The two primary challenges include 1) guiding a single LLM to adapt to diverse IMA objectives and 2) ensuring the flexibility and efficiency of the LLM in resource-constrained mobile environments. To tackle the first challenge, we propose ContextLoRA, a novel method that guides an LLM to learn the rich structured context among IMAs by constructing a task dependency graph. We partition the learnable parameter matrix of neural layers for each IMA to facilitate LLM composition. Then, we develop a step-by-step fine-tuning procedure guided by task relations, including training, freezing, and masking phases. This allows the LLM to learn to reason among tasks for better adaptation, capturing the latent dependencies between tasks. For the second challenge, we introduce ContextGear, a scheduling strategy to optimize the training procedure of ContextLoRA, aiming to minimize computational and communication costs through a strategic grouping mechanism. Experiments on three benchmarks show the superiority of the proposed ContextLoRA and ContextGear. Furthermore, we prototype our proposed paradigm on a real-world wireless testbed, demonstrating its practical applicability for various IMAs. We will release our code to the community.
中文: 本文提出了一种新颖的范式,利用单一组合式大语言模型在无线网络中处理多种交互式多模态应用,通过ContextLoRA实现任务适应和ContextGear优化调度,经实验和实际原型验证了其有效性。
English: This paper introduces a novel paradigm using a single compositional large language model (LLM) to handle multiple interactive multimodal applications (IMAs) over wireless networks, addressing challenges through ContextLoRA for task adaptation and ContextGear for efficient scheduling, validated by experiments and a real-world prototype.
Authors:Isaac Ngui, Courtney McBeth, André Santos, Grace He, Katherine J. Mimnaugh, James D. Motes, Luciano Soares, Marco Morales, Nancy M. Amato
Abstract:
We propose the Extended Reality Universal Planning Toolkit (ERUPT), an extended reality (XR) system for interactive motion planning. Our system allows users to create and dynamically reconfigure environments while they plan robot paths. In immersive three-dimensional XR environments, users gain a greater spatial understanding. XR also unlocks a broader range of natural interaction capabilities, allowing users to grab and adjust objects in the environment similarly to the real world, rather than using a mouse and keyboard with the scene projected onto a two-dimensional computer screen. Our system integrates with MoveIt, a manipulation planning framework, allowing users to send motion planning requests and visualize the resulting robot paths in virtual or augmented reality. We provide a broad range of interaction modalities, allowing users to modify objects in the environment and interact with a virtual robot. Our system allows operators to visualize robot motions, ensuring desired behavior as it moves throughout the environment, without risk of collisions within a virtual space, and to then deploy planned paths on physical robots in the real world.
Authors:Eason Chen, Sophia Judicke, Kayla Beigh, Xinyi Tang, Zimo Xiao, Chuangji Li, Shizhuo Li, Reed Luttmer, Shreya Singh, Maria Yampolsky, Naman Parikh, Yi Zhao, Meiyi Chen, Scarlett Huang, Anishka Mohanty, Gregory Johnson, John Mackey, Jionghao Lin, Ken Koedinger
Abstract:
We evaluate the effectiveness of LLM-Tutor, a large language model (LLM)-powered tutoring system that combines an AI-based proof-review tutor for real-time feedback on proof-writing and a chatbot for mathematics-related queries. Our experiment, involving 148 students, demonstrated that the use of LLM-Tutor significantly improved homework performance compared to a control group without access to the system. However, its impact on exam performance and time spent on tasks was found to be insignificant. Mediation analysis revealed that students with lower self-efficacy tended to use the chatbot more frequently, which partially contributed to lower midterm scores. Furthermore, students with lower self-efficacy were more likely to engage frequently with the proof-review-AI-tutor, a usage pattern that positively contributed to higher final exam scores. Interviews with 19 students highlighted the accessibility of LLM-Tutor and its effectiveness in addressing learning needs, while also revealing limitations and concerns regarding potential over-reliance on the tool. Our results suggest that generative AI alone like chatbot may not suffice for comprehensive learning support, underscoring the need for iterative design improvements with learning sciences principles with generative AI educational tools like LLM-Tutor.
Authors:Eason Chen, Jeffrey Li, Scarlett Huang, Xinyi Tang, Jionghao Lin, Paulo Carvalho, Kenneth Koedinger
Abstract:
We present an empirical study of how both experienced tutors and non-tutors judge the correctness of tutor praise responses under different Artificial Intelligence (AI)-assisted interfaces, types of explanation (textual explanations vs. inline highlighting). We first fine-tuned several Large Language Models (LLMs) to produce binary correctness labels and explanations, achieving up to 88% accuracy and 0.92 F1 score with GPT-4. We then let the GPT-4 models assist 95 participants in tutoring decision-making tasks by offering different types of explanations. Our findings show that although human-AI collaboration outperforms humans alone in evaluating tutor responses, it remains less accurate than AI alone. Moreover, we find that non-tutors tend to follow the AI's advice more consistently, which boosts their overall accuracy on the task: especially when the AI is correct. In contrast, experienced tutors often override the AI's correct suggestions and thus miss out on potential gains from the AI's generally high baseline accuracy. Further analysis reveals that explanations in text reasoning will increase over-reliance and reduce underreliance, while inline highlighting does not. Moreover, neither explanation style actually has a significant effect on performance and costs participants more time to complete the task, instead of saving time. Our findings reveal a tension between expertise, explanation design, and efficiency in AI-assisted decision-making, highlighting the need for balanced approaches that foster more effective human-AI collaboration.
Authors:He Zhang, Yueyan Liu, Xin Guan, Jie Cai, John M. Carroll
Abstract:
Semi-structured interviews highly rely on the quality of follow-up questions, yet interviewers' knowledge and skills may limit their depth and potentially affect outcomes. While many studies have shown the usefulness of large language models (LLMs) for qualitative analysis, their possibility in the data collection process remains underexplored. We adopt an AI-driven "Wizard-of-Oz" setup to investigate how real-time LLM support in generating follow-up questions shapes semi-structured interviews. Through a study with 17 participants, we examine the value of LLM-generated follow-up questions, the evolving division of roles, relationships, collaborative behaviors, and responsibilities between interviewers and AI. Our findings (1) provide empirical evidence of the strengths and limitations of AI-generated follow-up questions (AGQs); (2) introduce a Human-AI collaboration framework in this interview context; and (3) propose human-centered design guidelines for AI-assisted interviewing. We position LLMs as complements, not replacements, to human judgment, and highlight pathways for integrating AI into qualitative data collection.
Authors:Zheng Wei, Jia Sun, Junxiang Liao, Lik-Hang Lee, Pan Hui, Huamin Qu, Wai Tong, Xian Xu
Abstract:
In virtual reality (VR) education, especially in creative fields like film production, avatar design and narrative style extend beyond appearance and aesthetics. This study explores how the interaction between avatar gender, the dominant narrative actor's gender, and the learner's gender influences film production learning in VR, focusing on gaze dynamics and gender perspectives. Using a 2*2*2 experimental design, 48 participants operated avatars of different genders and interacted with male or female-dominant narratives. The results show that the consistency between the avatar and gender affects presence, and learners' control over the avatar is also influenced by gender matching. Learners using avatars of the opposite gender reported stronger control, suggesting gender incongruity prompted more focus on the avatar. Additionally, female participants with female avatars were more likely to adopt a "female gaze," favoring soft lighting and emotional shots, while male participants with male avatars were more likely to adopt a "male gaze," choosing dynamic shots and high contrast. When male participants used female avatars, they favored "female gaze," while female participants with male avatars focused on "male gaze". These findings advance our understanding of how avatar design and narrative style in VR-based education influence creativity and the cultivation of gender perspectives, and they offer insights for developing more inclusive and diverse VR teaching tools going forward.
Authors:Faruk Alpay, Hamdi Alakkad
Abstract:
We develop a rigorous measure-theoretic framework for the analysis of fixed points of nonexpansive maps in the space $L^1(μ)$, with explicit consideration of quantization errors arising in fixed-point arithmetic. Our central result shows that every bounded, closed, convex subset of $L^1(μ)$ that is compact in the topology of local convergence in measure (a property we refer to as measure-compactness) enjoys the fixed point property for nonexpansive mappings. The proof relies on techniques from uniform integrability, convexity in measure, and normal structure theory, including an application of Kirk's theorem. We further analyze the effect of quantization by modeling fixed-point arithmetic as a perturbation of a nonexpansive map, establishing the existence of approximate fixed points under measure-compactness conditions. We also present counterexamples that illustrate the optimality of our assumptions. Beyond the theoretical development, we apply this framework to a human-in-the-loop co-editing system. By formulating the interaction between an AI-generated proposal, a human editor, and a quantizer as a composition of nonexpansive maps on a measure-compact set, we demonstrate the existence of a "stable consensus artefact". We prove that such a consensus state remains an approximate fixed point even under bounded quantization errors, and we provide a concrete example of a human-AI editing loop that fits this framework. Our results underscore the value of measure-theoretic compactness in the design and verification of reliable collaborative systems involving humans and artificial agents.
Authors:Jiaqi Chen, Yanzhe Zhang, Yutong Zhang, Yijia Shao, Diyi Yang
Abstract:
Large language models (LLMs) are increasingly seen as assistants, copilots, and consultants, capable of supporting a wide range of tasks through natural conversation. However, most systems remain constrained by a linear request-response format that often makes interactions inefficient in multi-turn, information-dense, and exploratory tasks. To address these limitations, we propose Generative Interfaces for Language Models, a paradigm in which LLMs respond to user queries by proactively generating user interfaces (UIs) that enable more adaptive and interactive engagement. Our framework leverages structured interface-specific representations and iterative refinements to translate user queries into task-specific UIs. For systematic evaluation, we introduce a multidimensional assessment framework that compares generative interfaces with traditional chat-based ones across diverse tasks, interaction patterns, and query types, capturing functional, interactive, and emotional aspects of user experience. Results show that generative interfaces consistently outperform conversational ones, with humans preferring them in over 70% of cases. These findings clarify when and why users favor generative interfaces, paving the way for future advancements in human-AI interaction.
Authors:Lucio La Cava, Dominik Macko, Róbert Móro, Ivan Srba, Andrea Tagarelli
Abstract:
As Large Language Models (LLMs) have reached human-like fluency and coherence, distinguishing machine-generated text (MGT) from human-written content becomes increasingly difficult. While early efforts in MGT detection have focused on binary classification, the growing landscape and diversity of LLMs require a more fine-grained yet challenging authorship attribution (AA), i.e., being able to identify the precise generator (LLM or human) behind a text. However, AA remains nowadays confined to a monolingual setting, with English being the most investigated one, overlooking the multilingual nature and usage of modern LLMs. In this work, we introduce the problem of Multilingual Authorship Attribution, which involves attributing texts to human or multiple LLM generators across diverse languages. Focusing on 18 languages -- covering multiple families and writing scripts -- and 8 generators (7 LLMs and the human-authored class), we investigate the multilingual suitability of monolingual AA methods, their cross-lingual transferability, and the impact of generators on attribution performance. Our results reveal that while certain monolingual AA methods can be adapted to multilingual settings, significant limitations and challenges remain, particularly in transferring across diverse language families, underscoring the complexity of multilingual AA and the need for more robust approaches to better match real-world scenarios.
Authors:Runhua Zhang, Yang Ouyang, Leixian Shen, Yuying Tang, Xiaojuan Ma, Huamin Qu, Xian Xu
Abstract:
Researchers frequently need to synthesize their own publications into coherent narratives that demonstrate their scholarly contributions. To suit diverse communication contexts, exploring alternative ways to organize one's work while maintaining coherence is particularly challenging, especially in interdisciplinary fields like HCI where individual researchers' publications may span diverse domains and methodologies. In this paper, we present PaperBridge, a human-AI co-exploration system informed by a formative study and content analysis. PaperBridge assists researchers in exploring diverse perspectives for organizing their publications into coherent narratives. At its core is a bi-directional analysis engine powered by large language models, supporting iterative exploration through both top-down user intent (e.g., determining organization structure) and bottom-up refinement on narrative components (e.g., thematic paper groupings). Our user study (N=12) demonstrated PaperBridge's usability and effectiveness in facilitating the exploration of alternative research narratives. Our findings also provided empirical insights into how interactive systems can scaffold academic communication tasks.
Authors:Devin Lange, Shanghua Gao, Pengwei Sui, Austen Money, Priya Misner, Marinka Zitnik, Nils Gehlenborg
Abstract:
Incorporating natural language input has the potential to improve the capabilities of biomedical data discovery interfaces. However, user interface elements and visualizations are still powerful tools for interacting with data, even in the new world of generative AI. In our prototype system, YAC, Yet Another Chatbot, we bridge the gap between natural language and interactive visualizations by generating structured declarative output with a multi-agent system and interpreting that output to render linked interactive visualizations and apply data filters. Furthermore, we include widgets, which allow users to adjust the values of that structured output through user interface elements. We reflect on the capabilities and design of this system with an analysis of its technical dimensions and illustrate the capabilities through four usage scenarios.
Authors:Baiyu Chen, Benjamin Tag, Hao Xue, Daniel Angus, Flora Salim
Abstract:
Automated ad targeting on social media is opaque, creating risks of exploitation and invisibility to external scrutiny. Users may be steered toward harmful content while independent auditing of these processes remains blocked. Large Language Models (LLMs) raise a new concern: the potential to reverse-engineer sensitive user attributes from exposure alone. We introduce a multi-stage auditing framework to investigate these risks. First, a large-scale audit of over 435,000 ad impressions delivered to 891 Australian Facebook users reveals algorithmic biases, including disproportionate Gambling and Politics ads shown to socioeconomically vulnerable and politically aligned groups. Second, a multimodal LLM can reconstruct users' demographic profiles from ad streams, outperforming census-based baselines and matching or exceeding human performance. Our results provide the first empirical evidence that ad streams constitute rich digital footprints for public AI inference, highlighting urgent privacy risks and the need for content-level auditing and governance.
Authors:Devin Lange, Shanghua Gao, Pengwei Sui, Austen Money, Priya Misner, Marinka Zitnik, Nils Gehlenborg
Abstract:
We explore the potential for combining generative AI with grammar-based visualizations for biomedical data discovery. In our prototype, we use a multi-agent system to generate visualization specifications and apply filters. These visualizations are linked together, resulting in an interactive dashboard that is progressively constructed. Our system leverages the strengths of natural language while maintaining the utility of traditional user interfaces. Furthermore, we utilize generated interactive widgets enabling user adjustment. Finally, we demonstrate the potential utility of this system for biomedical data discovery with a case study.
Authors:Ana Davila, Jacinto Colan, Yasuhisa Hasegawa
Abstract:
Effective human-robot collaboration in surgery is affected by the inherent ambiguity of verbal communication. This paper presents a framework for a robotic surgical assistant that interprets and disambiguates verbal instructions from a surgeon by grounding them in the visual context of the operating field. The system employs a two-level affordance-based reasoning process that first analyzes the surgical scene using a multimodal vision-language model and then reasons about the instruction using a knowledge base of tool capabilities. To ensure patient safety, a dual-set conformal prediction method is used to provide a statistically rigorous confidence measure for robot decisions, allowing it to identify and flag ambiguous commands. We evaluated our framework on a curated dataset of ambiguous surgical requests from cholecystectomy videos, demonstrating a general disambiguation rate of 60% and presenting a method for safer human-robot interaction in the operating room.
Authors:Xin Wang, Ting Dang, Xinyu Zhang, Vassilis Kostakos, Michael J. Witbrock, Hong Jia
Abstract:
Mobile and wearable healthcare monitoring play a vital role in facilitating timely interventions, managing chronic health conditions, and ultimately improving individuals' quality of life. Previous studies on large language models (LLMs) have highlighted their impressive generalization abilities and effectiveness in healthcare prediction tasks. However, most LLM-based healthcare solutions are cloud-based, which raises significant privacy concerns and results in increased memory usage and latency. To address these challenges, there is growing interest in compact models, Small Language Models (SLMs), which are lightweight and designed to run locally and efficiently on mobile and wearable devices. Nevertheless, how well these models perform in healthcare prediction remains largely unexplored. We systematically evaluated SLMs on health prediction tasks using zero-shot, few-shot, and instruction fine-tuning approaches, and deployed the best performing fine-tuned SLMs on mobile devices to evaluate their real-world efficiency and predictive performance in practical healthcare scenarios. Our results show that SLMs can achieve performance comparable to LLMs while offering substantial gains in efficiency and privacy. However, challenges remain, particularly in handling class imbalance and few-shot scenarios. These findings highlight SLMs, though imperfect in their current form, as a promising solution for next-generation, privacy-preserving healthcare monitoring.
Authors:Jinrui Yang, Xudong Han, Timothy Baldwin
Abstract:
We introduce EuroParlVote, a novel benchmark for evaluating large language models (LLMs) in politically sensitive contexts. It links European Parliament debate speeches to roll-call vote outcomes and includes rich demographic metadata for each Member of the European Parliament (MEP), such as gender, age, country, and political group. Using EuroParlVote, we evaluate state-of-the-art LLMs on two tasks -- gender classification and vote prediction -- revealing consistent patterns of bias. We find that LLMs frequently misclassify female MEPs as male and demonstrate reduced accuracy when simulating votes for female speakers. Politically, LLMs tend to favor centrist groups while underperforming on both far-left and far-right ones. Proprietary models like GPT-4o outperform open-weight alternatives in terms of both robustness and fairness. We release the EuroParlVote dataset, code, and demo to support future research on fairness and accountability in NLP within political contexts.
Authors:Matte Lim, Catherine Yeh, Martin Wattenberg, Fernanda Viégas, Panagiotis Michalatos
Abstract:
Many real-world datasets -- from an artist's body of work to a person's social media history -- exhibit meaningful semantic changes over time that are difficult to capture with existing dimensionality reduction methods. To address this gap, we introduce a visualization technique that combines force-based projection and streaming clustering methods to build a spatial-temporal map of embeddings. Applying this technique, we create Chronotome, a tool for interactively exploring evolving themes in time-based data -- in real time. We demonstrate the utility of our approach through use cases on text and image data, showing how it offers a new lens for understanding the aesthetics and semantics of temporal datasets.
Authors:Yifan Zhang, Chen Huang, Yueke Zhang, Jiahao Zhang, Toby Jia-Jun Li, Collin McMillan, Kevin Leach, Yu Huang
Abstract:
Code language models (so-called CodeLLMs) are now commonplace in software development. As a general rule, CodeLLMs are trained by dividing training examples into input tokens and then learn importance of those tokens in a process called machine attention. Machine attention is based solely on input token salience to output token examples during training. Human software developers are different, as humans intuitively know that some tokens are more salient than others. While intuition itself is ineffable and a subject of philosophy, clues about salience are present in human visual attention, since people tend to look at more salient words more often. In this paper, we present EyeMulator, a technique for training CodeLLMs to mimic human visual attention while training for various software development tasks. We add special weights for each token in each input example to the loss function used during LLM fine-tuning. We draw these weights from observations of human visual attention derived from a previously-collected publicly-available dataset of eye-tracking experiments in software engineering tasks. These new weights ultimately induce changes in the attention of the subject LLM during training, resulting in a model that does not need eye-tracking data during inference. Our evaluation shows that EyeMulator outperforms strong LLM baselines on several tasks such as code translation, completion and summarization. We further show an ablation study that demonstrates the improvement is due to subject models learning to mimic human attention.
Authors:Maximilian Burzer, Tobias King, Till Riedel, Michael Beigl, Tobias Röddiger
Abstract:
The lack of standardization across Wearable Human Activity Recognition (WHAR) datasets limits reproducibility, comparability, and research efficiency. We introduce WHAR datasets, an open-source library designed to simplify WHAR data handling through a standardized data format and a configuration-driven design, enabling reproducible and computationally efficient workflows with minimal manual intervention. The library currently supports 9 widely-used datasets, integrates with PyTorch and TensorFlow, and is easily extensible to new datasets. To demonstrate its utility, we trained two state-of-the-art models, TinyHar and MLP-HAR, on the included datasets, approximately reproducing published results and validating the library's effectiveness for experimentation and benchmarking. Additionally, we evaluated preprocessing performance and observed speedups of up to 3.8x using multiprocessing. We hope this library contributes to more efficient, reproducible, and comparable WHAR research.
Authors:Catherine Yeh, Tara Menon, Robin Singh Arya, Helen He, Moira Weigel, Fernanda Viégas, Martin Wattenberg
Abstract:
Analyzing literature involves tracking interactions between characters, locations, and themes. Visualization has the potential to facilitate the mapping and analysis of these complex relationships, but capturing structured information from unstructured story data remains a challenge. As large language models (LLMs) continue to advance, we see an opportunity to use their text processing and analysis capabilities to augment and reimagine existing storyline visualization techniques. Toward this goal, we introduce an LLM-driven data parsing pipeline that automatically extracts relevant narrative information from novels and scripts. We then apply this pipeline to create Story Ribbons, an interactive visualization system that helps novice and expert literary analysts explore detailed character and theme trajectories at multiple narrative levels. Through pipeline evaluations and user studies with Story Ribbons on 36 literary works, we demonstrate the potential of LLMs to streamline narrative visualization creation and reveal new insights about familiar stories. We also describe current limitations of AI-based systems, and interaction motifs designed to address these issues.
Authors:Catalina Gomez, Lalithkumar Seenivasan, Xinrui Zou, Jeewoo Yoon, Sirui Chu, Ariel Leong, Patrick Kramer, Yu-Chun Ku, Jose L. Porras, Alejandro Martin-Gomez, Masaru Ishii, Mathias Unberath
Abstract:
Traditional surgical skill acquisition relies heavily on expert feedback, yet direct access is limited by faculty availability and variability in subjective assessments. While trainees can practice independently, the lack of personalized, objective, and quantitative feedback reduces the effectiveness of self-directed learning. Recent advances in computer vision and machine learning have enabled automated surgical skill assessment, demonstrating the feasibility of automatic competency evaluation. However, it is unclear whether such Artificial Intelligence (AI)-driven feedback can contribute to skill acquisition. Here, we examine the effectiveness of explainable AI (XAI)-generated feedback in surgical training through a human-AI study. We create a simulation-based training framework that utilizes XAI to analyze videos and extract surgical skill proxies related to primitive actions. Our intervention provides automated, user-specific feedback by comparing trainee performance to expert benchmarks and highlighting deviations from optimal execution through understandable proxies for actionable guidance. In a prospective user study with medical students, we compare the impact of XAI-guided feedback against traditional video-based coaching on task outcomes, cognitive load, and trainees' perceptions of AI-assisted learning. Results showed improved cognitive load and confidence post-intervention. While no differences emerged between the two feedback types in reducing performance gaps or practice adjustments, trends in the XAI group revealed desirable effects where participants more closely mimicked expert practice. This work encourages the study of explainable AI in surgical education and the development of data-driven, adaptive feedback mechanisms that could transform learning experiences and competency assessment.
Authors:Rebecca Yu, Valerie Chen, Ameet Talwalkar, Hoda Heidari
Abstract:
Growing excitement around deploying AI across various domains calls for a careful assessment of how human decision-makers interact with AI-powered systems. In particular, it is essential to understand when decision-makers voluntarily choose to consult AI tools, which we term decision-maker adoption. We interviewed experts across four domains -- medicine, law, journalism, and the public sector -- to explore current AI use cases and perceptions of adoption. From these interviews, we identify key factors that shape decision-maker adoption of AI tools: the decision-maker's background, perceptions of the AI, consequences for the decision-maker, and perceived implications for other stakeholders. We translate these factors into an AI adoption sheet to analyze how decision-makers approach adoption choices through comparative, cross-domain case studies, highlighting how our factors help explain inter-domain differences in adoption. Our findings offer practical guidance for supporting the responsible and context-aware deployment of AI by better accounting for the decision-maker's perspective.
Authors:Dennis Zyska, Ilia Kuznetsov, Florian Müller, Iryna Gurevych
Abstract:
Educational technologies often misalign with instructors' pedagogical goals, forcing adaptations that compromise teaching efficacy. In this paper, we present a case study on the co-development of curriculum and technology in the context of a university course on scientific writing. Specifically, we examine how a custom-built peer feedback system was iteratively developed alongside the course to support annotation, feedback exchange, and revision. Results show that while co-development fostered stronger alignment between software features and course goals, it also exposed usability limitations and infrastructure-related frustrations, emphasizing the need for closer coordination between teaching and technical teams.
Authors:Lalithkumar Seenivasan, Jiru Xu, Roger D. Soberanis Mukul, Hao Ding, Grayson Byrd, Yu-Chun Ku, Jose L. Porras, Masaru Ishii, Mathias Unberath
Abstract:
Emerging surgical data science and robotics solutions, especially those designed to provide assistance in situ, require natural human-machine interfaces to fully unlock their potential in providing adaptive and intuitive aid. Contemporary AI-driven solutions remain inherently rigid, offering limited flexibility and restricting natural human-machine interaction in dynamic surgical environments. These solutions rely heavily on extensive task-specific pre-training, fixed object categories, and explicit manual-prompting. This work introduces a novel Perception Agent that leverages speech-integrated prompt-engineered large language models (LLMs), segment anything model (SAM), and any-point tracking foundation models to enable a more natural human-machine interaction in real-time intraoperative surgical assistance. Incorporating a memory repository and two novel mechanisms for segmenting unseen elements, Perception Agent offers the flexibility to segment both known and unseen elements in the surgical scene through intuitive interaction. Incorporating the ability to memorize novel elements for use in future surgeries, this work takes a marked step towards human-machine symbiosis in surgical procedures. Through quantitative analysis on a public dataset, we show that the performance of our agent is on par with considerably more labor-intensive manual-prompting strategies. Qualitatively, we show the flexibility of our agent in segmenting novel elements (instruments, phantom grafts, and gauze) in a custom-curated dataset. By offering natural human-machine interaction and overcoming rigidity, our Perception Agent potentially brings AI-based real-time assistance in dynamic surgical environments closer to reality.
Authors:Jonas Hummel, Tobias Röddiger, Valeria Zitz, Philipp Lepold, Michael Küttner, Marius Prill, Christopher Clarke, Hans Gellersen, Michael Beigl
Abstract:
Interaction with earables - earphones equipped with additional sensors - has been identified as one of four major areas of earable research. Worn naturally and positioned near key physiological signals, earables support a wide range of interaction modalities and have demonstrated the ability to detect multiple inputs simultaneously. Yet this diversity has resulted in a fragmented body of research, making it increasingly difficult to track developments and identify relevant studies. To address this, we introduce EarXplore, a curated, interactive online database on earable interaction research. Designed through a question-centered process that guided both the development of 34 criteria applied to annotate 118 studies and the structure of the platform, EarXplore comprises four distinct yet integrated views: a Tabular View for structured exploration, a Graphical View for visual overviews, a Similarity View for identifying conceptual links, and a Timeline View for analyzing trends and scholarly lineage. We demonstrate how the platform supports tailored exploration, targeted filtering, and interactive information retrieval, allowing researchers to query the literature and synthesize information in the format of their choice. We furthermore leverage the contents and capabilities of the platform to discuss the research gaps and opportunities in the field. With built-in mechanisms for continuous community updates, EarXplore not only reflects the current state of the field but also evolves alongside it, serving as a living resource to inform and accelerate future developments.
Authors:Ana Davila, Jacinto Colan, Yasuhisa Hasegawa
Abstract:
Large Language Models (LLMs) have demonstrated significant capabilities in understanding and generating human language, contributing to more natural interactions with complex systems. However, they face challenges such as ambiguity in user requests processed by LLMs. To address these challenges, this paper introduces and evaluates a multi-agent debate framework designed to enhance detection and resolution capabilities beyond single models. The framework consists of three LLM architectures (Llama3-8B, Gemma2-9B, and Mistral-7B variants) and a dataset with diverse ambiguities. The debate framework markedly enhanced the performance of Llama3-8B and Mistral-7B variants over their individual baselines, with Mistral-7B-led debates achieving a notable 76.7% success rate and proving particularly effective for complex ambiguities and efficient consensus. While acknowledging varying model responses to collaborative strategies, these findings underscore the debate framework's value as a targeted method for augmenting LLM capabilities. This work offers important insights for developing more robust and adaptive language understanding systems by showing how structured debates can lead to improved clarity in interactive systems.
Authors:Ana Davila, Jacinto Colan, Yasuhisa Hasegawa
Abstract:
Ambiguity in natural language instructions poses significant risks in safety-critical human-robot interaction, particularly in domains such as surgery. To address this, we propose a framework that uses Large Language Models (LLMs) for ambiguity detection specifically designed for collaborative surgical scenarios. Our method employs an ensemble of LLM evaluators, each configured with distinct prompting techniques to identify linguistic, contextual, procedural, and critical ambiguities. A chain-of-thought evaluator is included to systematically analyze instruction structure for potential issues. Individual evaluator assessments are synthesized through conformal prediction, which yields non-conformity scores based on comparison to a labeled calibration dataset. Evaluating Llama 3.2 11B and Gemma 3 12B, we observed classification accuracy exceeding 60% in differentiating ambiguous from unambiguous surgical instructions. Our approach improves the safety and reliability of human-robot collaboration in surgery by offering a mechanism to identify potentially ambiguous instructions before robot action.
Authors:Jacinto Colan, Ana Davila, Yutaro Yamada, Yasuhisa Hasegawa
Abstract:
Human-robot collaboration in surgery represents a significant area of research, driven by the increasing capability of autonomous robotic systems to assist surgeons in complex procedures. This systematic review examines the advancements and persistent challenges in the development of autonomous surgical robotic assistants (ASARs), focusing specifically on scenarios where robots provide meaningful and active support to human surgeons. Adhering to the PRISMA guidelines, a comprehensive literature search was conducted across the IEEE Xplore, Scopus, and Web of Science databases, resulting in the selection of 32 studies for detailed analysis. Two primary collaborative setups were identified: teleoperation-based assistance and direct hands-on interaction. The findings reveal a growing research emphasis on ASARs, with predominant applications currently in endoscope guidance, alongside emerging progress in autonomous tool manipulation. Several key challenges hinder wider adoption, including the alignment of robotic actions with human surgeon preferences, the necessity for procedural awareness within autonomous systems, the establishment of seamless human-robot information exchange, and the complexities of skill acquisition in shared workspaces. This review synthesizes current trends, identifies critical limitations, and outlines future research directions essential to improve the reliability, safety, and effectiveness of human-robot collaboration in surgical environments.
Authors:Jueun Lee, Martin Flipe, Philipp Lepold, Tobias Röddiger, Michael Beigl
Abstract:
Wearable haptic interventions offer promising support for relaxation through slow, vibrotactile biofeedback. Despite their potential, current applications focus on stress-inducing procedures and fixed vibration patterns, with limited consideration of body location and dynamic biofeedback during restful states. This study investigates the effects of haptic biofeedback adjusted from real-time heart rate during eyes-closed wakeful rest, comparing four wearable body placements: the wrist, hand, forearm, and shoulder. Heart rate, alpha wave activity on the ear, subjective restfulness, and vibration experience were measured across these conditions. Results show that biofeedback reduced heart rate at the wrist, shoulder, and forearm, while alpha power measured at the ear remained unchanged. Subjective restfulness was rated highest at the shoulder and forearm, which were also the most preferred locations. In addition, participants reported greater comfort, relaxation, and further increased sleepiness at the forearm compared to the wrist, which was more easily recognizable. These findings suggest that the forearm and shoulder are ideal for unobtrusive relaxation feedback for wakeful rest, while the wrist may require design improvements for subjective experience.
Authors:Jueun Lee, Dennis Moschina, Supraja Ramesh, Tobias Röddiger, Kai Kunze, Michael Beigl
Abstract:
We investigate the use of musically structured, closed-loop vibration patterns as a passive biofeedback intervention for relaxation and sleep initiation. By encoding rhythmic meter structures into smartwatch vibrations and adapting their frequency to be slightly slower than the user's real-time heart rate, our system aims to reduce arousal through tactile entrainment, offering a non-invasive alternative to auditory or open-loop approaches previously used in sleep and anxiety contexts. In the first study (N=20), we compared five adaptive vibration rhythms for their effects on heart rate and subjective perceptions of relaxation in a resting context. In the second study (N=28), we evaluated the most promising pattern from Study 1 in a prolonged sleep initiation setting. Results showed increased parasympathetic activity and perceived relaxation during short-term stimulation, but no significant effects on sleep-related measures during the sleep onset phase. This work contributes to the understanding of how wearable haptic feedback can support relaxation and sleep, offering design insights and identifying methodological considerations for effectively integrating haptic interaction into self-directed interventions.
Authors:Yanwei Huang, Wesley Hanwen Deng, Sijia Xiao, Motahhare Eslami, Jason I. Hong, Arpit Narechania, Adam Perer
Abstract:
Despite their increasing capabilities, text-to-image generative AI systems are known to produce biased, offensive, and otherwise problematic outputs. While recent advancements have supported testing and auditing of generative AI, existing auditing methods still face challenges in supporting effectively explore the vast space of AI-generated outputs in a structured way. To address this gap, we conducted formative studies with five AI auditors and synthesized five design goals for supporting systematic AI audits. Based on these insights, we developed Vipera, an interactive auditing interface that employs multiple visual cues including a scene graph to facilitate image sensemaking and inspire auditors to explore and hierarchically organize the auditing criteria. Additionally, Vipera leverages LLM-powered suggestions to facilitate exploration of unexplored auditing directions. Through a controlled experiment with 24 participants experienced in AI auditing, we demonstrate Vipera's effectiveness in helping auditors navigate large AI output spaces and organize their analyses while engaging with diverse criteria.
Authors:Xinying Hou, Ruiwei Xiao, Runlong Ye, Michael Liut, John Stamper
Abstract:
The broad adoption of Generative AI (GenAI) is impacting Computer Science education, and recent studies found its benefits and potential concerns when students use it for programming learning. However, most existing explorations focus on GenAI tools that primarily support text-to-text interaction. With recent developments, GenAI applications have begun supporting multiple modes of communication, known as multimodality. In this work, we explored how undergraduate programming novices choose and work with multimodal GenAI tools, and their criteria for choices. We selected a commercially available multimodal GenAI platform for interaction, as it supports multiple input and output modalities, including text, audio, image upload, and real-time screen-sharing. Through 16 think-aloud sessions that combined participant observation with follow-up semi-structured interviews, we investigated student modality choices for GenAI tools when completing programming problems and the underlying criteria for modality selections. With multimodal communication emerging as the future of AI in education, this work aims to spark continued exploration on understanding student interaction with multimodal GenAI in the context of CS education.
Authors:Zhiping Zhang, Yi Evie Zhang, Freda Shi, Tianshi Li
Abstract:
Large Language Model (LLM) agents require personal information for personalization in order to better act on users' behalf in daily tasks, but this raises privacy concerns and a personalization-privacy dilemma. Agent's autonomy introduces both risks and opportunities, yet its effects remain unclear. To better understand this, we conducted a 3$\times$3 between-subjects experiment ($N=450$) to study how agent's autonomy level and personalization influence users' privacy concerns, trust and willingness to use, as well as the underlying psychological processes. We find that personalization without considering users' privacy preferences increases privacy concerns and decreases trust and willingness to use. Autonomy moderates these effects: Intermediate autonomy flattens the impact of personalization compared to No- and Full autonomy conditions. Our results suggest that rather than aiming for perfect model alignment in output generation, balancing autonomy of agent's action and user control offers a promising path to mitigate the personalization-privacy dilemma.
Authors:Shiyi Zhang, Dong Liang, Yihang Zhou
Abstract:
Reconstructing visual information from brain activity via computer vision technology provides an intuitive understanding of visual neural mechanisms. Despite progress in decoding fMRI data with generative models, achieving accurate cross-subject reconstruction of visual stimuli remains challenging and computationally demanding. This difficulty arises from inter-subject variability in neural representations and the brain's abstract encoding of core semantic features in complex visual inputs. To address these challenges, we propose NeuroSwift, which integrates complementary adapters via diffusion: AutoKL for low-level features and CLIP for semantics. NeuroSwift's CLIP Adapter is trained on Stable Diffusion generated images paired with COCO captions to emulate higher visual cortex encoding. For cross-subject generalization, we pretrain on one subject and then fine-tune only 17 percent of parameters (fully connected layers) for new subjects, while freezing other components. This enables state-of-the-art performance with only one hour of training per subject on lightweight GPUs (three RTX 4090), and it outperforms existing methods.
Authors:Hiroto Tsutsui, Takefumi Hiraki, Yuichi Hiroi, Shoichi Hasegawa
Abstract:
This study quantitatively analyzes the structural characteristics of user communities within Social Virtual Reality (Social VR) platforms supporting head-mounted displays (HMDs), based on large-scale log data. By detecting and evaluating community structures from data on substantial interactions (defined as prolonged co-presence in the same virtual space), we found that Social VR platforms tend to host numerous, relatively small communities characterized by strong internal cohesion and limited inter-community connections. This finding contrasts with the large-scale, broadly connected community structures typically observed in conventional Social Networking Services (SNS). Furthermore, we identified a user segment capable of mediating between communities, despite these users not necessarily having numerous direct connections. We term this user segment `community hoppers' and discuss their characteristics. These findings contribute to a deeper understanding of the community structures that emerge within the unique communication environment of Social VR and the roles users play within them.
Authors:Nadia Nahar, Chenyang Yang, Yanxin Chen, Wesley Hanwen Deng, Ken Holstein, Motahhare Eslami, Christian Kästner
Abstract:
Responsible AI (RAI) tools -- checklists, templates, and governance processes -- often engage RAI champions, individuals intrinsically motivated to advocate ethical practices, but fail to reach non-champions, who frequently dismiss them as bureaucratic tasks. To explore this gap, we shadowed meetings and interviewed data scientists at an organization, finding that practitioners perceived RAI as irrelevant to their work. Building on these insights and theoretical foundations, we derived design principles for engaging non-champions, and introduced sticky stories -- narratives of unexpected ML harms designed to be concrete, severe, surprising, diverse, and relevant, unlike widely circulated media to which practitioners are desensitized. Using a compound AI system, we generated and evaluated sticky stories through human and LLM assessments at scale, confirming they embodied the intended qualities. In a study with 29 practitioners, we found that, compared to regular stories, sticky stories significantly increased time spent on harm identification, broadened the range of harms recognized, and fostered deeper reflection.
Authors:Avinash Ajit Nargund, Arthur Caetano, Kevin Yang, Rose Yiwei Liu, Philip Tezaur, Kriteen Shrestha, Qisen Pan, Tobias Höllerer, Misha Sra
Abstract:
Human-AI collaboration is typically offered in one of two of user control levels: guidance, where the AI provides suggestions and the human makes the final decision, and delegation, where the AI acts autonomously within user-defined constraints. Systems that integrate both modes, common in robotic surgery or driving assistance, often overlook shifts in user preferences within a task in response to factors like evolving trust, decision complexity, and perceived control. In this work, we investigate how users dynamically switch between higher and lower levels of control during a sequential decision-making task. Using a hand-and-brain chess setup, participants either selected a piece and the AI decided how it moved (brain mode), or the AI selected a piece and the participant decided how it moved (hand mode). We collected over 400 mode-switching decisions from eight participants, along with gaze, emotional state, and subtask difficulty data. Statistical analysis revealed significant differences in gaze patterns and subtask complexity prior to a switch and in the quality of the subsequent move. Based on these results, we engineered behavioral and task-specific features to train a lightweight model that predicted control level switches ($F1 = 0.65$). The model performance suggests that real-time behavioral signals can serve as a complementary input alongside system-driven mode-switching mechanisms currently used. We complement our quantitative results with qualitative factors that influence switching including perceived AI ability, decision complexity, and level of control, identified from post-game interview analysis. The combined behavioral and modeling insights can help inform the design of shared autonomy systems that need dynamic, subtask-level control switches aligned with user intent and evolving task demands.
Authors:K. J. Kevin Feng, Tzu-Sheng Kuo, Quan Ze, Chen, Inyoung Cheong, Kenneth Holstein, Amy X. Zhang
Abstract:
As LLMs gain adoption in high-stakes domains like mental health, domain experts are increasingly consulted to provide input into policies governing their behavior. From an observation of 19 policymaking workshops with 9 experts over 15 weeks, we identified opportunities to better support rapid experimentation, feedback, and iteration for collaborative policy design processes. We present PolicyPad, an interactive system that facilitates the emerging practice of LLM policy prototyping by drawing from established UX prototyping practices, including heuristic evaluation and storyboarding. Using PolicyPad, policy designers can collaborate on drafting a policy in real time while independently testing policy-informed model behavior with usage scenarios. We evaluate PolicyPad through workshops with 8 groups of 22 domain experts in mental health and law, finding that PolicyPad enhanced collaborative dynamics during policy design, enabled tight feedback loops, and led to novel policy contributions. Overall, our work paves participatory paths for advancing AI alignment and safety.
Authors:Kihoon Son, DaEun Choi, Tae Soo Kim, Young-Ho Kim, Sangdoo Yun, Juho Kim
Abstract:
Capturing professionals' decision-making in creative workflows is essential for reflection, collaboration, and knowledge sharing, yet existing methods often leave rationales incomplete and implicit decisions hidden. To address this, we present CLEAR framework that structures reasoning into cognitive decision steps-linked units of actions, artifacts, and self-explanations that make decisions traceable. Building on this framework, we introduce ClearFairy, a think-aloud AI assistant for UI design that detects weak explanations, asks lightweight clarifying questions, and infers missing rationales to ease the knowledge-sharing burden. In a study with twelve creative professionals, 85% of ClearFairy's inferred rationales were accepted, increasing strong explanations from 14% to over 83% of decision steps without adding cognitive demand. The captured steps also enhanced generative AI agents in Figma, yielding next-action predictions better aligned with professionals and producing more coherent design outcomes. For future research on human knowledge-grounded creative AI agents, we release a dataset of captured 417 decision steps.
Authors:Siyeon Bak, Dongyun Han, Inho Jo, Sun-Jeong Kim, Isaac Cho
Abstract:
While Virtual Reality (VR) systems have become increasingly immersive, they still rely predominantly on visual input, which can constrain perceptual performance when visual information is limited. Incorporating additional sensory modalities, such as sound and scent, offers a promising strategy to enhance user experience and overcome these limitations. This paper investigates the contribution of auditory and olfactory cues in supporting perception within the portal metaphor, a VR technique that reveals remote environments through narrow, visually constrained transitions. We conducted a user study in which participants identified target scenes by selecting the correct portal among alternatives under varying sensory conditions. The results demonstrate that integrating visual, auditory, and olfactory cues significantly improved both recognition accuracy and response time. These findings highlight the potential of multisensory integration to compensate for visual constraints in VR and emphasize the value of incorporating sound and scent to enhance perception, immersion, and interaction within future VR system designs.
Authors:Dongyun Han, Siyeon Bak, So-Hui Kim, Kangsoo Kim, Sun-Jeong Kim, Isaac Cho
Abstract:
Incorporating multi-sensory cues into Virtual Reality (VR) can significantly enhance user experiences, mirroring the multi-sensory interactions we encounter in the real-world. Olfaction plays a crucial role in shaping impressions when engaging with others. This study examines how non-verbal cues from virtual agents-specifically olfactory cues, emotional expressions, and gender-influence user perceptions during encounters with virtual agents. Our findings indicate that in unscented, woodsy, and floral scent conditions, participants primarily relied on visually observable cues to form their impressions of virtual agents. Positive emotional expressions, conveyed through facial expressions and gestures, contributed to more favorable impressions, with this effect being stronger for the female agent than the male agent. However, in the unpleasant scent condition, participants consistently formed negative impressions, which overpowered the influence of emotional expressions and gender, suggesting that aversive olfactory stimuli can detrimentally impact user perceptions. Our results emphasize the importance of carefully selecting olfactory stimuli when designing immersive and engaging VR interactions. Finally, we present our findings and outline future research directions for effectively integrating olfactory cues into virtual agents.
Authors:Zhejing Hu, Yan Liu, Zhi Zhang, Gong Chen, Bruce X. B. Yu, Jiannong Cao
Abstract:
In the era of human-AI co-creation, the maxim "knowing is easy, doing is hard" is redefined. AI has the potential to ease execution, yet the essence of "hard" lies in who governs the translation from knowing to doing. Mainstream tools often centralize interpretive authority and homogenize expression, suppressing marginal voices. To address these challenges, we introduce the first systematic framework for redistributing authority in the knowing-doing cycle, built on three principles, namely contestability, agency, and plurality. Through interactive studies with 180 music practitioners, complemented by in-depth interviews, we demonstrate that these principles reshape human-AI authority relations and reactivate human creative expression. The findings establish a new paradigm for critical computing and human-AI co-creation that advances from critique to practice.
Authors:Zhejing Hu, Yan Liu, Zhi Zhang, Gong Chen, Bruce X. B. Yu, Junxian Li, Jiannong Cao
Abstract:
Adolescence is marked by strong creative impulses but limited strategies for structured expression, often leading to frustration or disengagement. While generative AI lowers technical barriers and delivers efficient outputs, its role in fostering adolescents' expressive growth has been overlooked. We propose MusicScaffold, the first adolescent-centered framework that repositions AI as a guide, coach, and partner, making expressive strategies transparent and learnable, and supporting autonomy. In a four-week study with middle school students (ages 12--14), MusicScaffold enhanced cognitive specificity, behavioral self-regulation, and affective confidence in music creation. By reframing generative AI as a scaffold rather than a generator, this work bridges the machine efficiency of generative systems with human growth in adolescent creative education.
Authors:Ambre Assor, Hyeon Jeon, Sungbok Shin, Jean-Daniel Fekete
Abstract:
We propose leveraging Large Language Models (LLMs) as an interaction layer for medical visualization systems. In domains like healthcare, where users must navigate high-dimensional, coded, and heterogeneous datasets, LLM-generated queries enable expert medical users to express complex analytical intents in natural language. These intents are then translated into editable and executable queries, replacing the dynamic query interfaces used by traditional visualization systems built around sliders, check boxes, and drop-downs. This interaction model reduces visual clutter and eliminates the need for users to memorize field names or system codes, supporting fluid exploration, with the drawback of not exposing all the filtering criteria. We also reintroduce dynamic queries on demand to better support interactive exploration. We posit that medical users are trained to know the possible filtering options but challenged to remember the details of the attribute names and code values. We demonstrate this paradigm in ParcoursVis, our scalable EventFlow-inspired patient care pathway visualization system powered by the French National Health Data System, one of the largest health data repositories in the world.
Authors:Maximilian Warsinke, Francesco Vona, Tanja KojiÄ, Jan-Niklas Voigt-Antons, Sebastian Möller
Abstract:
This study evaluates the user experience (UX) in extended reality (XR) tourism of two digital twin-based applications: an Augmented Reality Virtual Tour (AR-VT) for enhanced on-site visits and a Virtual Reality Virtual Tour (VR-VT) for remote exploration. Using a quantitative exploratory approach, 84 participants from Spain and Germany, divided into three sample groups, assessed UX, task load, presence, cybersickness, and emotional response through standardized questionnaires. Findings indicate that both applications provided a low task load and high enjoyment. The VR-based tour enhanced presence but posed usability and cybersickness challenges, while the AR-based tour achieved high UX ratings, with qualitative feedback suggesting areas for refinement. Correlation analysis revealed significant relationships between age, prior XR experience, and technological affinity with the measured metrics for both applications. These results highlight the importance of well-designed experiences tailored to XR novices, reinforcing the critical role of UX in digital twin-based XR tourism.
Authors:Wesley Hanwen Deng, Sunnie S. Y. Kim, Akshita Jha, Ken Holstein, Motahhare Eslami, Lauren Wilcox, Leon A Gatys
Abstract:
Recent developments in AI governance and safety research have called for red-teaming methods that can effectively surface potential risks posed by AI models. Many of these calls have emphasized how the identities and backgrounds of red-teamers can shape their red-teaming strategies, and thus the kinds of risks they are likely to uncover. While automated red-teaming approaches promise to complement human red-teaming by enabling larger-scale exploration of model behavior, current approaches do not consider the role of identity. As an initial step towards incorporating people's background and identities in automated red-teaming, we develop and evaluate a novel method, PersonaTeaming, that introduces personas in the adversarial prompt generation process to explore a wider spectrum of adversarial strategies. In particular, we first introduce a methodology for mutating prompts based on either "red-teaming expert" personas or "regular AI user" personas. We then develop a dynamic persona-generating algorithm that automatically generates various persona types adaptive to different seed prompts. In addition, we develop a set of new metrics to explicitly measure the "mutation distance" to complement existing diversity measurements of adversarial prompts. Our experiments show promising improvements (up to 144.1%) in the attack success rates of adversarial prompts through persona mutation, while maintaining prompt diversity, compared to RainbowPlus, a state-of-the-art automated red-teaming method. We discuss the strengths and limitations of different persona types and mutation methods, shedding light on future opportunities to explore complementarities between automated and human red-teaming approaches.
Authors:Sina Hinzmann, Francesco Vona, Juliane Henning, Mohamed Amer, Omar Abdellatif, Tanja Kojic, Jan-Niklas Voigt-Antons
Abstract:
As augmented reality (AR) becomes increasingly prevalent in mobile and context-aware applications, the role of auditory cues in guiding users through physical environments is becoming critical. This study investigates the effectiveness and user experience of various categories of audio cues, including fully non-verbal sounds and speech-derived Spearcons, during outdoor navigation tasks using the Meta Quest 3 headset. Twenty participants navigated five outdoor routes using audio-only cue types: Artificial Sounds, Nature Sounds, Spearcons, Musical Instruments, and Auditory Icons. Subjective evaluations were collected to assess the perceived effectiveness and user experience of each sound type. Results revealed significant differences in perceived novelty and stimulation across sound types. Artificial Sounds and Musical Instruments were rated higher than Spearcons in novelty, while Artificial Sounds were also rated higher than Spearcons in stimulation. Overall preference was evenly split between Nature Sounds and Artificial Sounds. These findings suggest that incorporating aspects of novelty and user engagement in auditory feedback design may enhance the effectiveness of AR navigation systems.
Authors:Theresa Pekarek Rosin, Julia Gachot, Henri-Leon Kordt, Matthias Kerzel, Stefan Wermter
Abstract:
Automatic Speech Recognition (ASR) systems in real-world settings need to handle imperfect audio, often degraded by hardware limitations or environmental noise, while accommodating diverse user groups. In human-robot interaction (HRI), these challenges intersect to create a uniquely challenging recognition environment. We evaluate four state-of-the-art ASR systems on eight publicly available datasets that capture six dimensions of difficulty: domain-specific, accented, noisy, age-variant, impaired, and spontaneous speech. Our analysis demonstrates significant variations in performance, hallucination tendencies, and inherent biases, despite similar scores on standard benchmarks. These limitations have serious implications for HRI, where recognition errors can interfere with task performance, user trust, and safety.
Authors:Jiayi Wang, Ruiwei Xiao, Xinying Hou, John Stamper
Abstract:
K-12 educators are increasingly using Large Language Models (LLMs) to create instructional materials. These systems excel at producing fluent, coherent content, but often lack support for high-quality teaching. The reason is twofold: first, commercial LLMs, such as ChatGPT and Gemini which are among the most widely accessible to teachers, do not come preloaded with the depth of pedagogical theory needed to design truly effective activities; second, although sophisticated prompt engineering can bridge this gap, most teachers lack the time or expertise and find it difficult to encode such pedagogical nuance into their requests. This study shifts pedagogical expertise from the user's prompt to the LLM's internal architecture. We embed the well-established Knowledge-Learning-Instruction (KLI) framework into a Multi-Agent System (MAS) to act as a sophisticated instructional designer. We tested three systems for generating secondary Math and Science learning activities: a Single-Agent baseline simulating typical teacher prompts; a role-based MAS where agents work sequentially; and a collaborative MAS-CMD where agents co-construct activities through conquer and merge discussion. The generated materials were evaluated by 20 practicing teachers and a complementary LLM-as-a-judge system using the Quality Matters (QM) K-12 standards. While the rubric scores showed only small, often statistically insignificant differences between the systems, the qualitative feedback from educators painted a clear and compelling picture. Teachers strongly preferred the activities from the collaborative MAS-CMD, describing them as significantly more creative, contextually relevant, and classroom-ready. Our findings show that embedding pedagogical principles into LLM systems offers a scalable path for creating high-quality educational content.
Authors:Ruiwei Xiao, Xinying Hou, Ying-Jui Tseng, Hsuan Nieu, Guanze Liao, John Stamper, Kenneth R. Koedinger
Abstract:
As Artificial Intelligence (AI) becomes increasingly integrated into daily life, there is a growing need to equip the next generation with the ability to apply, interact with, evaluate, and collaborate with AI systems responsibly. Prior research highlights the urgent demand from K-12 educators to teach students the ethical and effective use of AI for learning. To address this need, we designed an Large-Language Model (LLM)-based module to teach prompting literacy. This includes scenario-based deliberate practice activities with direct interaction with intelligent LLM agents, aiming to foster secondary school students' responsible engagement with AI chatbots. We conducted two iterations of classroom deployment in 11 authentic secondary education classrooms, and evaluated 1) AI-based auto-grader's capability; 2) students' prompting performance and confidence changes towards using AI for learning; and 3) the quality of learning and assessment materials. Results indicated that the AI-based auto-grader could grade student-written prompts with satisfactory quality. In addition, the instructional materials supported students in improving their prompting skills through practice and led to positive shifts in their perceptions of using AI for learning. Furthermore, data from Study 1 informed assessment revisions in Study 2. Analyses of item difficulty and discrimination in Study 2 showed that True/False and open-ended questions could measure prompting literacy more effectively than multiple-choice questions for our target learners. These promising outcomes highlight the potential for broader deployment and highlight the need for broader studies to assess learning effectiveness and assessment design.
Authors:Kangyu Wang, Hongliang He, Lin Liu, Ruiqi Liang, Zhenzhong Lan, Jianguo Li
Abstract:
Large Language Models (LLMs) and Multimodal Large Language Models (MLLMs) have ushered in a new era of AI capabilities, demonstrating near-human-level performance across diverse scenarios. While numerous benchmarks (e.g., MMLU) and leaderboards (e.g., Chatbot Arena) have been proposed to help evolve the development of LLMs and MLLMs, most rely on static datasets or crowdsourced general-domain prompts, often falling short of reflecting performance in real-world applications. To bridge this critical gap, we present Inclusion Arena, a live leaderboard that ranks models based on human feedback collected directly from AI-powered applications. Our platform integrates pairwise model comparisons into natural user interactions, ensuring evaluations reflect practical usage scenarios. For robust model ranking, we employ the Bradley-Terry model augmented with two key innovations: (1) Placement Matches, a cold-start mechanism to quickly estimate initial ratings for newly integrated models, and (2) Proximity Sampling, an intelligent comparison strategy that prioritizes battles between models of similar capabilities to maximize information gain and enhance rating stability. Extensive empirical analyses and simulations demonstrate that Inclusion Arena yields reliable and stable rankings, exhibits higher data transitivity compared to general crowdsourced datasets, and significantly mitigates the risk of malicious manipulation. By fostering an open alliance between foundation models and real-world applications, Inclusion Arena aims to accelerate the development of LLMs and MLLMs truly optimized for practical, user-centric deployments. The platform is publicly accessible at https://www.tbox.cn/about/model-ranking.
Authors:Tao Long, Sitong Wang, Ãmilie Fabre, Tony Wang, Anup Sathya, Jason Wu, Savvas Petridis, Dingzeyu Li, Tuhin Chakrabarty, Yue Jiang, Jingyi Li, Tiffany Tseng, Ken Nakagaki, Qian Yang, Nikolas Martelaro, Jeffrey V. Nickerson, Lydia B. Chilton
Abstract:
UIST researchers develop tools to address user challenges. However, user interactions with AI evolve over time through learning, adaptation, and repurposing, making one time evaluations insufficient. Capturing these dynamics requires longer-term studies, but challenges in deployment, evaluation design, and data collection have made such longitudinal research difficult to implement. Our workshop aims to tackle these challenges and prepare researchers with practical strategies for longitudinal studies. The workshop includes a keynote, panel discussions, and interactive breakout groups for discussion and hands-on protocol design and tool prototyping sessions. We seek to foster a community around longitudinal system research and promote it as a more embraced method for designing, building, and evaluating UIST tools.
Authors:Masanori Ibara, Yuichi Hiroi, Takushi Kamegai, Takefumi Hiraki
Abstract:
Traditionally, specialized 3D design data, such as BIM and CAD, have been accessible only to a select group of experts, creating significant barriers that prevent general users from participating in decision-making processes. This paper provides a systematic overview of practical insights for utilizing 3D data in industrial and architectural domains by presenting implementation cases of the industrial metaverse on Cluster, a commercial cross-device metaverse platform. This paper analyzes the characteristics and constraints of major data formats in the industrial and architectural fields and organizes integration workflows for the metaverse. Through application cases utilizing 3D data across multiple domains, we present practical examples of collaborative decision-making support enabled by the fusion of metaverse and digital twin technologies. Specifically, we demonstrate that multi-device access and simultaneous multi-user participation capabilities foster democratic environments in the industrial metaverse, which are challenging to achieve with conventional, expert-dependent systems.
Authors:Luis Morales-Navarro, Michelle Gan, Evelyn Yu, Lauren Vogelstein, Yasmin B. Kafai, Danaé Metaxa
Abstract:
This study investigates how high school-aged youth engage in algorithm auditing to identify and understand biases in artificial intelligence and machine learning (AI/ML) tools they encounter daily. With AI/ML technologies being increasingly integrated into young people's lives, there is an urgent need to equip teenagers with AI literacies that build both technical knowledge and awareness of social impacts. Algorithm audits (also called AI audits) have traditionally been employed by experts to assess potential harmful biases, but recent research suggests that non-expert users can also participate productively in auditing. We conducted a two-week participatory design workshop with 14 teenagers (ages 14-15), where they audited the generative AI model behind TikTok's Effect House, a tool for creating interactive TikTok filters. We present a case study describing how teenagers approached the audit, from deciding what to audit to analyzing data using diverse strategies and communicating their results. Our findings show that participants were engaged and creative throughout the activities, independently raising and exploring new considerations, such as age-related biases, that are uncommon in professional audits. We drew on our expertise in algorithm auditing to triangulate their findings as a way to examine if the workshop supported participants to reach coherent conclusions in their audit. Although the resulting number of changes in race, gender, and age representation uncovered by the teens were slightly different from ours, we reached similar conclusions. This study highlights the potential for auditing to inspire learning activities to foster AI literacies, empower teenagers to critically examine AI systems, and contribute fresh perspectives to the study of algorithmic harms.
Authors:Arthur Caetano, Radha Kumaran, Kelvin Jou, Tobias Höllerer, Misha Sra
Abstract:
Artificial intelligence (AI) and extended reality (XR) are increasingly combined in applications such as motor skill training, personalized feedback, and embodied task guidance. Yet developing AI-XR systems remains challenging due to fragmented toolchains that push developers into ad hoc integrations, diverting their attention away from essential design concerns such as interactivity and context awareness. To address this issue, we present XARP (XR Agent-ready Remote Procedures), a toolkit for AI-XR development designed for both human developers and AI agents. XARP implements JSON-based remote procedure calls that allow server-side Python to control XR clients, providing a high-level abstraction over low-level integration details. Humans can use XARP as a Python library to write XR applications with reduced implementation overhead. AI agents operate with the same abstraction to dynamically call tools to generate XR applications at runtime in response to context changes and user requests. XARP offers Model Context Protocol (MCP) connectivity that allows third-party agents and tools to leverage XR capabilities, previously unavailable. We conducted three case studies that demonstrate XARP supports a variety of AI-XR applications, including AI-guided fencing, drone assistance, and room layout design. We evaluated XARP in a walkthrough study with 24 AI and XR developers. UTAUT scores indicate high potential for adoption, and participants reported that XARP can reduce authoring time, lower entry barriers for developers unfamiliar with AI or XR, and enable the implementation of novel AI-XR systems.
Authors:Hikari Yanagawa, Yuichi Hiroi, Satomi Tokida, Yuji Hatada, Takefumi Hiraki
Abstract:
While commercial metaverse platforms offer diverse user-generated content, they lack effective navigation assistance that can dynamically adapt to users' interests and intentions. Although previous research has investigated on-demand agents in controlled environments, implementation in commercial settings with diverse world configurations and platform constraints remains challenging.
We present Navigation Pixie, an on-demand navigation agent employing a loosely coupled architecture that integrates structured spatial metadata with LLM-based natural language processing while minimizing platform dependencies, which enables experiments on the extensive user base of commercial metaverse platforms. Our cross-platform experiments on commercial metaverse platform Cluster with 99 PC client and 94 VR-HMD participants demonstrated that Navigation Pixie significantly increased dwell time and free exploration compared to fixed-route and no-agent conditions across both platforms. Subjective evaluations revealed consistent on-demand preferences in PC environments versus context-dependent social perception advantages in VR-HMD. This research contributes to advancing VR interaction design through conversational spatial navigation agents, establishes cross-platform evaluation methodologies revealing environment-dependent effectiveness, and demonstrates empirical experimentation frameworks for commercial metaverse platforms.
Authors:Juntong Chen, Huayuan Ye, He Zhu, Siwei Fu, Changbo Wang, Chenhui Li
Abstract:
Accurate and reliable visualization of spatiotemporal sensor data such as environmental parameters and meteorological conditions is crucial for informed decision-making. Traditional spatial interpolation methods, however, often fall short of producing reliable interpolation results due to the limited and irregular sensor coverage. This paper introduces a novel spatial interpolation pipeline that achieves reliable interpolation results and produces a novel heatmap representation with uncertainty information encoded. We leverage imputation reference data from Graph Neural Networks (GNNs) to enhance visualization reliability and temporal resolution. By integrating Principal Neighborhood Aggregation (PNA) and Geographical Positional Encoding (GPE), our model effectively learns the spatiotemporal dependencies. Furthermore, we propose an extrinsic, static visualization technique for interpolation-based heatmaps that effectively communicates the uncertainties arising from various sources in the interpolated map. Through a set of use cases, extensive evaluations on real-world datasets, and user studies, we demonstrate our model's superior performance for data imputation, the improvements to the interpolant with reference data, and the effectiveness of our visualization design in communicating uncertainties.
Authors:Yuemin Mao, Uksang Yoo, Yunchao Yao, Shahram Najam Syed, Luca Bondi, Jonathan Francis, Jean Oh, Jeffrey Ichnowski
Abstract:
Accurately estimating hand pose and hand-object contact events is essential for robot data-collection, immersive virtual environments, and biomechanical analysis, yet remains challenging due to visual occlusion, subtle contact cues, limitations in vision-only sensing, and the lack of accessible and flexible tactile sensing. We therefore introduce VibeMesh, a novel wearable system that fuses vision with active acoustic sensing for dense, per-vertex hand contact and pose estimation. VibeMesh integrates a bone-conduction speaker and sparse piezoelectric microphones, distributed on a human hand, emitting structured acoustic signals and capturing their propagation to infer changes induced by contact. To interpret these cross-modal signals, we propose a graph-based attention network that processes synchronized audio spectra and RGB-D-derived hand meshes to predict contact with high spatial resolution. We contribute: (i) a lightweight, non-intrusive visuo-acoustic sensing platform; (ii) a cross-modal graph network for joint pose and contact inference; (iii) a dataset of synchronized RGB-D, acoustic, and ground-truth contact annotations across diverse manipulation scenarios; and (iv) empirical results showing that VibeMesh outperforms vision-only baselines in accuracy and robustness, particularly in occluded or static-contact settings.
Authors:Nan Xiang, Tianyi Liang, Haiwen Huang, Shiqi Jiang, Hao Huang, Yifei Huang, Liangyu Chen, Changbo Wang, Chenhui Li
Abstract:
Text-to-3D (T23D) generation has transformed digital content creation, yet remains bottlenecked by blind trial-and-error prompting processes that yield unpredictable results. While visual prompt engineering has advanced in text-to-image domains, its application to 3D generation presents unique challenges requiring multi-view consistency evaluation and spatial understanding. We present Sel3DCraft, a visual prompt engineering system for T23D that transforms unstructured exploration into a guided visual process. Our approach introduces three key innovations: a dual-branch structure combining retrieval and generation for diverse candidate exploration; a multi-view hybrid scoring approach that leverages MLLMs with innovative high-level metrics to assess 3D models with human-expert consistency; and a prompt-driven visual analytics suite that enables intuitive defect identification and refinement. Extensive testing and user studies demonstrate that Sel3DCraft surpasses other T23D systems in supporting creativity for designers.
Authors:DaEun Choi, Kihoon Son, Jaesang Yu, Hyunjoon Jung, Juho Kim
Abstract:
Generative AI opens new possibilities for design exploration by rapidly generating images aligned with user goals. However, our formative study (N=7) revealed two key challenges that limit broad and efficient exploration with these models: the lack of expressive channels for articulating exploratory directions and ranges, and insufficient support for reusing past intents. We present IdeaBlocks, where users can modularize exploratory intents into Exploration Blocks, capturing property, direction, and range of exploration. Users can reuse prior intents at multiple levels (block, path, and project) with options for literal or context-adaptive reuse. In our comparative study (N=12), participants using IdeaBlocks explored 2.13 times more images with 12.5% greater visual diversity than the baseline, demonstrating how structured intent expression and reuse support more effective exploration. A three-day deployment study (N=6) further revealed how different reuse units and mechanisms enabled distinct creative strategies, offering design implications for future intent-aware creativity support systems.
Authors:Yoonsu Kim, Brandon Chin, Kihoon Son, Seoyoung Kim, Juho Kim
Abstract:
Effective collaboration with generative AI systems requires users to clearly communicate their intents (intent-based outcome specification). Yet such intents are often underspecified and evolve during interaction, dynamic support for intent communication is essential. Through a systematic literature review of 33 papers, we synthesize a structured understanding of intent communication, identifying four key aspects: articulation, exploration, management, and synchronization. Building on these findings, we derived design implications that translate them into actionable design and implemented IntentFlow, a system for LLM-based writing that realizes these implications through adjustable UIs, intent-to-output linking, and versioned refinement. A technical evaluation (N=60) and a within-subjects study (N=12) confirm that IntentFlow helps users discover, elaborate, and consolidate their intents into a curated set. Interaction logs further reveal a shift from reactive error correction to proactive intent refinement. Our work demonstrates how a system effectively designed to support these four communication aspects can substantially enhance human-LLM interaction.
Authors:Yaowei Bai, Ruiheng Zhang, Yu Lei, Jingfeng Yao, Shuguang Ju, Chaoyang Wang, Wei Yao, Yiwan Guo, Guilin Zhang, Chao Wan, Qian Yuan, Xuhua Duan, Xinggang Wang, Tao Sun, Yongchao Xu, Chuansheng Zheng, Huangxuan Zhao, Bo Du
Abstract:
A global shortage of radiologists has been exacerbated by the significant volume of chest X-ray workloads, particularly in primary care. Although multimodal large language models show promise, existing evaluations predominantly rely on automated metrics or retrospective analyses, lacking rigorous prospective clinical validation. Janus-Pro-CXR (1B), a chest X-ray interpretation system based on DeepSeek Janus-Pro model, was developed and rigorously validated through a multicenter prospective trial (NCT06874647). Our system outperforms state-of-the-art X-ray report generation models in automated report generation, surpassing even larger-scale models including ChatGPT 4o (200B parameters), while demonstrating robust detection of eight clinically critical radiographic findings (area under the curve, AUC > 0.8). Retrospective evaluation confirms significantly higher report accuracy than Janus-Pro and ChatGPT 4o. In prospective clinical deployment, AI assistance significantly improved report quality scores (4.37 vs. 4.11, P < 0.001), reduced interpretation time by 18.5% (P < 0.001), and was preferred by a majority of experts (3 out of 5) in 52.7% of cases. Through lightweight architecture and domain-specific optimization, Janus-Pro-CXR improves diagnostic reliability and workflow efficiency, particularly in resource-constrained settings. The model architecture and implementation framework will be open-sourced to facilitate the clinical translation of AI-assisted radiology solutions.
中文: Janus-Pro-CXR 胸部X光智能诊断系统通过前瞻性临床验证,在提升诊断准确性和工作效率方面表现卓越,其轻量化架构特别适合资源有限的环境使用。
English: Janus-Pro-CXR, a lightweight AI system for chest X-ray interpretation, demonstrates superior accuracy in detecting critical findings and generating reports while significantly improving clinical workflow efficiency and diagnostic reliability in prospective trials.
Authors:Lizhi Ma, Tong Zhao, Shuai Zhang, Nirui Song, Hongliang He, Anqi Li, Ran Feng, Huachuan Qiu, Jingsong Ma, Zhenzhong Lan
Abstract:
This study explores the relationship between linguistic expressions and psychological states of depression and anxiety within Chinese psycho-counseling interactions, focusing specifically on the usage of first-person singular pronouns and negative emotional words. Utilizing a corpus derived from 735 online counseling sessions, the analysis employed a general linear mixed-effect model to assess linguistic patterns quantified by the Linguistic Inquiry and Word Count (LIWC) software. Results indicate a significant positive correlation between the frequency of negative emotional words and the severity of both depressive and anxious states among clients. However, contrary to prior findings predominantly derived from English-language contexts, the usage frequency of first-person singular pronouns did not vary significantly with the clients' psychological conditions. These outcomes are discussed within the framework of cultural distinctions between collectivist Chinese contexts and individualistic Western settings, as well as the interactive dynamics unique to psycho-counseling conversations. The findings highlight the nuanced influence of cultural and conversational contexts on language use in mental health communications, providing insights into psycholinguistic markers relevant to therapeutic practices in Chinese-speaking populations.
Authors:Haotian Wu, Shufan Jiang, Chios Chen, Yiyang Feng, Hehai Lin, Heqing Zou, Yao Shu, Yanran Li, Chengwei Qin
Abstract:
As large language models (LLMs) advance in role-playing (RP) tasks, existing benchmarks quickly become obsolete due to their narrow scope, outdated interaction paradigms, and limited adaptability across diverse application scenarios. To address this gap, we introduce FURINA-Builder, a novel multi-agent collaboration pipeline that automatically constructs fully customizable RP benchmarks at any scale. It enables evaluation of arbitrary characters across diverse scenarios and prompt formats, as the first benchmark builder in RP area for adaptable assessment. FURINA-Builder simulates dialogues between a test character and other characters drawn from a well-constructed character-scene pool, while an LLM judge selects fine-grained evaluation dimensions and adjusts the test character's responses into final test utterances. Using this pipeline, we build FURINA-Bench, a new comprehensive role-playing benchmark featuring both established and synthesized test characters, each assessed with dimension-specific evaluation criteria. Human evaluation and preliminary separability analysis justify our pipeline and benchmark design. We conduct extensive evaluations of cutting-edge LLMs and find that o3 and DeepSeek-R1 achieve the best performance on English and Chinese RP tasks, respectively. Across all models, established characters consistently outperform synthesized ones, with reasoning capabilities further amplifying this disparity. Interestingly, we observe that model scale does not monotonically reduce hallucinations. More critically, for reasoning LLMs, we uncover a novel trade-off: reasoning improves RP performance but simultaneously increases RP hallucinations. This trade-off extends to a broader Pareto frontier between RP performance and reliability for all LLMs. These findings demonstrate the effectiveness of FURINA-Builder and the challenge posed by FURINA-Bench.
Authors:Siying Hu, Yaxing Yao, Zhicong Lu
Abstract:
Large Language Models are profoundly changing work patterns in high-risk professional domains, yet their application also introduces severe and underexplored compliance risks. To investigate this issue, we conducted semi-structured interviews with 24 highly-skilled knowledge workers from industries such as law, healthcare, and finance. The study found that these experts are commonly concerned about sensitive information leakage, intellectual property infringement, and uncertainty regarding the quality of model outputs. In response, they spontaneously adopt various mitigation strategies, such as actively distorting input data and limiting the details in their prompts. However, the effectiveness of these spontaneous efforts is limited due to a lack of specific compliance guidance and training for Large Language Models. Our research reveals a significant gap between current NLP tools and the actual compliance needs of experts. This paper positions these valuable empirical findings as foundational work for building the next generation of Human-Centered, Compliance-Driven Natural Language Processing for Regulatory Technology (RegTech), providing a critical human-centered perspective and design requirements for engineering NLP systems that can proactively support expert compliance workflows.
Authors:Keyu He, Tejas Srinivasan, Brihi Joshi, Xiang Ren, Jesse Thomason, Swabha Swayamdipta
Abstract:
When people query Vision-Language Models (VLMs) but cannot see the accompanying visual context (e.g. for blind and low-vision users), augmenting VLM predictions with natural language explanations can signal which model predictions are reliable. However, prior work has found that explanations can easily convince users that inaccurate VLM predictions are correct. To remedy undesirable overreliance on VLM predictions, we propose evaluating two complementary qualities of VLM-generated explanations via two quality scoring functions. We propose Visual Fidelity, which captures how faithful an explanation is to the visual context, and Contrastiveness, which captures how well the explanation identifies visual details that distinguish the model's prediction from plausible alternatives. On the A-OKVQA and VizWiz tasks, these quality scoring functions are better calibrated with model correctness than existing explanation qualities. We conduct a user study in which participants have to decide whether a VLM prediction is accurate without viewing its visual context. We observe that showing our quality scores alongside VLM explanations improves participants' accuracy at predicting VLM correctness by 11.1%, including a 15.4% reduction in the rate of falsely believing incorrect predictions. These findings highlight the utility of explanation quality scores in fostering appropriate reliance on VLM predictions.
Authors:Bipul Thapa, Biplov Paneru, Bishwash Paneru, Khem Narayan Poudyal
Abstract:
This paper presents an Artificial Intelligence (AI) integrated novel approach to Brain-Computer Interface (BCI)-based wheelchair development, utilizing a motor imagery right-left-hand movement mechanism for control. The system is designed to simulate wheelchair navigation based on motor imagery right and left-hand movements using electroencephalogram (EEG) data. A pre-filtered dataset, obtained from an open-source EEG repository, was segmented into arrays of 19x200 to capture the onset of hand movements. The data was acquired at a sampling frequency of 200Hz. The system integrates a Tkinter-based interface for simulating wheelchair movements, offering users a functional and intuitive control system. We propose a BiLSTM-BiGRU model that shows a superior test accuracy of 92.26% as compared with various machine learning baseline models, including XGBoost, EEGNet, and a transformer-based model. The Bi-LSTM-BiGRU attention-based model achieved a mean accuracy of 90.13% through cross-validation, showcasing the potential of attention mechanisms in BCI applications.
Authors:Julian Geheeb, Farhan Abid Ivan, Daniel Dyrda, Miriam Anschütz, Georg Groh
Abstract:
Recent research has demonstrated that large language models (LLMs) can support experts across various domains, including game design. In this study, we examine the utility of medium-sized LLMs, models that operate on consumer-grade hardware typically available in small studios or home environments. We began by identifying ten key aspects that contribute to a strong game concept and used ChatGPT to generate thirty sample game ideas. Three medium-sized LLMs, LLaMA 3.1, Qwen 2.5, and DeepSeek-R1, were then prompted to evaluate these ideas according to the previously identified aspects. A qualitative assessment by two researchers compared the models' outputs, revealing that DeepSeek-R1 produced the most consistently useful feedback, despite some variability in quality. To explore real-world applicability, we ran a pilot study with ten students enrolled in a storytelling course for game development. At the early stages of their own projects, students used our prompt and DeepSeek-R1 to refine their game concepts. The results indicate a positive reception: most participants rated the output as high quality and expressed interest in using such tools in their workflows. These findings suggest that current medium-sized LLMs can provide valuable feedback in early game design, though further refinement of prompting methods could improve consistency and overall effectiveness.
Authors:Cindy Peng, Alice Qian, Linghao Jin, Jieneng Chen, Evans Xu Han, Paul Pu Liang, Hong Shen, Haiyi Zhu, Jane Hsieh
Abstract:
Recent generative AI advances present new possibilities for supporting visual art creation, but how such promise might assist novice artists during early-stage processes requires investigation. How novices adopt or resist these tools can shift the relationship between the art community and generative systems. We interviewed 13 artists to uncover needs in key dimensions during early stages of creation: (1) quicker and better access to references, (2) visualizations of reference combinations, (3) external artistic feedback, and (4) personalized support to learn new techniques and styles. Mapping such needs to state-of-the-art open-sourced advances, we developed a set of six interactive prototypes to expose emerging capabilities to novice artists. Afterward, we conducted co-design workshops with 13 novice visual artists through which artists articulated requirements and tensions for artist-centered AI development. Our work reveals opportunities to design novice-targeted tools that foreground artists' needs, offering alternative visions for generative AI to serve visual creativity.
Authors:Maximilian Warsinke, Maurizio Vergari, Tanja KojiÄ, Daniel Nikulin, Sebastian Möller
Abstract:
This study explores how prior exposure to physical objects influences the quality and realism perception of Digital Twins (DT) with varying levels of fidelity in Virtual Reality (VR). In a mixed experimental design, 24 participants were divided into two equal groups: an exposure group, in which members were shown physical objects before inspecting and rating their replicas in VR, and a control group without prior knowledge. Three objects were presented, each under four fidelity conditions with varying texture resolution and geometric detail. Participants rated perceived quality and realism through in-VR self-reports. Statistical analysis revealed that texture resolution significantly affected realism and quality perception, whereas geometric detail only influenced quality ratings. Investigating the between-factor, no significant effect of exposure on quality and realism perception was found. These findings raise important questions about the cognitive relationship between physical objects and their digital counterparts and how fidelity influences the perception of DTs in VR.
Authors:Andrew Zhu, Chris Callison-Burch
Abstract:
Imagine AI assistants that enhance conversations without interrupting them: quietly providing relevant information during a medical consultation, seamlessly preparing materials as teachers discuss lesson plans, or unobtrusively scheduling meetings as colleagues debate calendars. While modern conversational LLM agents directly assist human users with tasks through a chat interface, we study this alternative paradigm for interacting with LLM agents, which we call "overhearing agents." Rather than demanding the user's attention, overhearing agents continuously monitor ambient activity and intervene only when they can provide contextual assistance. In this paper, we present the first analysis of overhearing LLM agents as a distinct paradigm in human-AI interaction and establish a taxonomy of overhearing agent interactions and tasks grounded in a survey of works on prior LLM-powered agents and exploratory HCI studies. Based on this taxonomy, we create a list of best practices for researchers and developers building overhearing agent systems. Finally, we outline the remaining research gaps and reveal opportunities for future research in the overhearing paradigm.
Authors:Amama Mahmood, Bokyung Kim, Honghao Zhao, Molly E. Atwood, Luis F. Buenaver, Michael T. Smith, Chien-Ming Huang
Abstract:
The sleep diary is a widely used clinical tool for understanding sleep disorders; however, low patient compliance and limited capture of contextual information constrain its effectiveness and leave specialists with an incomplete picture of patients' sleep-related behaviors. In this work, we re-imagine Behavioral Sleep Medicine (BSM) by designing a voice-based conversational sleep diary and specialist-facing visualization tool. Through this design process, we probed specialists' vision of how conversational agents (CAs) could extend beyond diary intake to enhance behavioral sleep medicine. Our multi-stage approach included: (1) interviews with specialists to identify shortcomings in current use of text-based diaries, (2) iterative co-design of a conversational diary and visualization tool, and (3) focus groups to explore the broader potential of CAs in BSM. This work contributes design insights into how CAs can support behavioral interventions, highlights opportunities and challenges for integration into practice, and expands the design space of CAs for behavioral health.
Authors:Hilda Hadan, Michaela Valiquette, Lennart E. Nacke, Leah Zhang-Kennedy
Abstract:
Virtual Reality (VR) technologies offer immersive experiences but collect substantial user data. While deceptive design is well-studied in 2D platforms, little is known about its manifestation in VR environments and its impact on user privacy. This research investigates deceptive designs in privacy communication and interaction mechanisms of 12 top-rated VR games and applications through autoethnographic evaluation of the applications and thematic analysis of privacy policies. We found that while many deceptive designs rely on 2D interfaces, some VR-unique features, while not directly enabling deception, amplified data disclosure behaviors, and obscured actual data practices. Convoluted privacy policies and manipulative consent practices further hinder comprehension and increase privacy risks. We also observed privacy-preserving design strategies and protective considerations in VR privacy policies. We offer recommendations for ethical VR design that balance immersive experiences with strong privacy protections, guiding researchers, designers, and policymakers to improve privacy in VR environments.
Authors:Hao-Shu Fang, Branden Romero, Yichen Xie, Arthur Hu, Bo-Ruei Huang, Juan Alvarez, Matthew Kim, Gabriel Margolis, Kavya Anbarasu, Masayoshi Tomizuka, Edward Adelson, Pulkit Agrawal
Abstract:
We introduce perioperation, a paradigm for robotic data collection that sensorizes and records human manipulation while maximizing the transferability of the data to real robots. We implement this paradigm in DEXOP, a passive hand exoskeleton designed to maximize human ability to collect rich sensory (vision + tactile) data for diverse dexterous manipulation tasks in natural environments. DEXOP mechanically connects human fingers to robot fingers, providing users with direct contact feedback (via proprioception) and mirrors the human hand pose to the passive robot hand to maximize the transfer of demonstrated skills to the robot. The force feedback and pose mirroring make task demonstrations more natural for humans compared to teleoperation, increasing both speed and accuracy. We evaluate DEXOP across a range of dexterous, contact-rich tasks, demonstrating its ability to collect high-quality demonstration data at scale. Policies learned with DEXOP data significantly improve task performance per unit time of data collection compared to teleoperation, making DEXOP a powerful tool for advancing robot dexterity. Our project page is at https://dex-op.github.io.
Authors:Minja Axelsson, Jiaee Cheong, Rune Nyrup, Hatice Gunes
Abstract:
Recent studies indicate that robotic coaches can play a crucial role in promoting wellbeing. However, the real-world deployment of wellbeing robots raises numerous ethical and socio-technical questions and concerns. To explore these questions, we undertake a community-centered investigation to examine three different communities' perspectives on using robotic wellbeing coaches in real-world environments. We frame our work as an anticipatory ethical investigation, which we undertake to better inform the development of robotic technologies with communities' opinions, with the ultimate goal of aligning robot development with public interest. We conducted workshops with three communities who are under-represented in robotics development: 1) members of the public at a science festival, 2) women computer scientists at a conference, and 3) humanities researchers interested in history and philosophy of science. In the workshops, we collected qualitative data using the Social Robot Co-Design Canvas on Ethics. We analysed the collected qualitative data with Thematic Analysis, informed by notes taken during workshops. Through our analysis, we identify four themes regarding key ethical and socio-technical questions about the real-world use of wellbeing robots. We group participants' insights and discussions around these broad thematic questions, discuss them in light of state-of-the-art literature, and highlight areas for future investigation. Finally, we provide the four questions as a broad framework that roboticists can and should use during robotic development and deployment, in order to reflect on the ethics and socio-technical dimensions of their robotic applications, and to engage in dialogue with communities of robot users. The four questions are: 1) Is the robot safe and how can we know that?, 2) Who is the robot built for and with?, 3) Who owns the robot and the data?, and 4) Why a robot?.
Authors:Jinwen Tang, Qiming Guo, Zhicheng Tang, Yi Shang
Abstract:
Educational systems often assume learners can identify their knowledge gaps, yet research consistently shows that students struggle to recognize what they don't know they need to learn-the "unknown unknowns" problem. This paper presents a novel Recursive Prerequisite Knowledge Tracing (RPKT) system that addresses this challenge through dynamic prerequisite discovery using large language models. Unlike existing adaptive learning systems that rely on pre-defined knowledge graphs, our approach recursively traces prerequisite concepts in real-time until reaching a learner's actual knowledge boundary. The system employs LLMs for intelligent prerequisite extraction, implements binary assessment interfaces for cognitive load reduction, and provides personalized learning paths based on identified knowledge gaps. Demonstration across computer science domains shows the system can discover multiple nested levels of prerequisite dependencies, identify cross-domain mathematical foundations, and generate hierarchical learning sequences without requiring pre-built curricula. Our approach shows great potential for advancing personalized education technology by enabling truly adaptive learning across any academic domain.
Authors:Abdul Basit, Maha Nawaz, Saim Rehman, Muhammad Shafique
Abstract:
Efficient control of prosthetic limbs via non-invasive brain-computer interfaces (BCIs) requires advanced EEG processing, including pre-filtering, feature extraction, and action prediction, performed in real time on edge AI hardware. Achieving this on resource-constrained devices presents challenges in balancing model complexity, computational efficiency, and latency. We present CognitiveArm, an EEG-driven, brain-controlled prosthetic system implemented on embedded AI hardware, achieving real-time operation without compromising accuracy. The system integrates BrainFlow, an open-source library for EEG data acquisition and streaming, with optimized deep learning (DL) models for precise brain signal classification. Using evolutionary search, we identify Pareto-optimal DL configurations through hyperparameter tuning, optimizer analysis, and window selection, analyzed individually and in ensemble configurations. We apply model compression techniques such as pruning and quantization to optimize models for embedded deployment, balancing efficiency and accuracy. We collected an EEG dataset and designed an annotation pipeline enabling precise labeling of brain signals corresponding to specific intended actions, forming the basis for training our optimized DL models. CognitiveArm also supports voice commands for seamless mode switching, enabling control of the prosthetic arm's 3 degrees of freedom (DoF). Running entirely on embedded hardware, it ensures low latency and real-time responsiveness. A full-scale prototype, interfaced with the OpenBCI UltraCortex Mark IV EEG headset, achieved up to 90% accuracy in classifying three core actions (left, right, idle). Voice integration enables multiplexed, variable movement for everyday tasks (e.g., handshake, cup picking), enhancing real-world performance and demonstrating CognitiveArm's potential for advanced prosthetic control.
Authors:Constantin Ruhdorfer, Matteo Bortoletto, Victor Oei, Anna Penzkofer, Andreas Bulling
Abstract:
We introduce Unsupervised Partner Design (UPD) - a population-free, multi-agent reinforcement learning framework for robust ad-hoc teamwork that adaptively generates training partners without requiring pretrained partners or manual parameter tuning. UPD constructs diverse partners by stochastically mixing an ego agent's policy with biased random behaviours and scores them using a variance-based learnability metric that prioritises partners near the ego agent's current learning frontier. We show that UPD can be integrated with unsupervised environment design, resulting in the first method enabling fully unsupervised curricula over both level and partner distributions in a cooperative setting. Through extensive evaluations on Overcooked-AI and the Overcooked Generalisation Challenge, we demonstrate that this dynamic partner curriculum is highly effective: UPD consistently outperforms both population-based and population-free baselines as well as ablations. In a user study, we further show that UPD achieves higher returns than all baselines and was perceived as significantly more adaptive, more human-like, a better collaborator, and less frustrating.
Authors:Xiaoqing Chen, Siyang Li, Dongrui Wu
Abstract:
Electroencephalogram (EEG) decoding models for brain-computer interfaces (BCIs) struggle with cross-dataset learning and generalization due to channel layout inconsistencies, non-stationary signal distributions, and limited neurophysiological prior integration. To address these issues, we propose a plug-and-play Alignment-Based Frame-Patch Modeling (AFPM) framework, which has two main components: 1) Spatial Alignment, which selects task-relevant channels based on brain-region priors, aligns EEG distributions across domains, and remaps the selected channels to a unified layout; and, 2) Frame-Patch Encoding, which models multi-dataset signals into unified spatiotemporal patches for EEG decoding. Compared to 17 state-of-the-art approaches that need dataset-specific tuning, the proposed calibration-free AFPM achieves performance gains of up to 4.40% on motor imagery and 3.58% on event-related potential tasks. To our knowledge, this is the first calibration-free cross-dataset EEG decoding framework, substantially enhancing the practicalness of BCIs in real-world applications.
Authors:Neil K. R. Sehgal, Hita Kambhamettu, Allen Chang, Andrew Zhu, Lyle Ungar, Sharath Chandra Guntuku
Abstract:
Effective communication in serious illness and palliative care is essential but often under-taught due to limited access to training resources like standardized patients. We present PAL (Palliative Assisted Learning-bot), a conversational system that simulates emotionally nuanced patient interactions and delivers structured feedback grounded in an existing empathy-based framework. PAL supports text and voice modalities and is designed to scaffold clinical skill-building through repeated, low-cost practice. Through a mixed-methods study with 17 U.S. medical trainees and clinicians, we explore user engagement with PAL, evaluate usability, and examine design tensions around modalities, emotional realism, and feedback delivery. Participants found PAL helpful for reflection and skill refinement, though some noted limitations in emotional authenticity and the adaptability of feedback. We contribute: (1) empirical evidence that large language models can support palliative communication training; (2) design insights for modality-aware, emotionally sensitive simulation tools; and (3) implications for systems that support emotional labor, cooperative learning, and AI-augmented training in high-stakes care settings.
Authors:Nabeel Nisar Bhat, Maksim Karnaukh, Jakob Struye, Rafael Berkvens, Jeroen Famaey
Abstract:
Person identification plays a vital role in enabling intelligent, personalized, and secure human-computer interaction. Recent research has demonstrated the feasibility of leveraging Wi-Fi signals for passive person identification using a person's unique gait pattern. Although most existing work focuses on sub-6 GHz frequencies, the emergence of mmWave offers new opportunities through its finer spatial resolution, though its comparative advantages for person identification remain unexplored. This work presents the first comparative study between sub-6 GHz and mmWave Wi-Fi signals for person identification with commercial off-the-shelf (COTS) Wi-Fi, using a novel dataset of synchronized measurements from the two frequency bands in an indoor environment. To ensure a fair comparison, we apply identical training pipelines and model configurations across both frequency bands. Leveraging end-to-end deep learning, we show that even at low sampling rates (10 Hz), mmWave Wi-Fi signals can achieve high identification accuracy (91.2% on 20 individuals) when combined with effective background subtraction.
Authors:Ningzhi Tang, David Meininger, Gelei Xu, Yiyu Shi, Yu Huang, Collin McMillan, Toby Jia-Jun Li
Abstract:
Code modification requires developers to comprehend code, plan changes, articulate intentions, and validate outcomes, making it a cognitively demanding process. Generated natural language code summaries aid comprehension but remain static and limited in supporting the full workflow. We present NaturalEdit, a system that makes code summaries interactive and adaptive representations directly linked to source code. Grounded in the Cognitive Dimensions of Notations, NaturalEdit implements a paradigm of code modification through interaction with natural language representations through three key features: (1) adaptive multi-faceted representation of code summaries with flexible Abstraction Gradient; (2) interactive mapping mechanisms between summaries and codes, ensuring a tight Closeness of Mapping; and (3) intent-driven, bidirectional synchronization that reduces Viscosity in editing and validation. A technical evaluation confirms the performance of NaturalEdit, and a user study with 12 developers shows that it enhances comprehension, intent articulation, and validation, giving developers greater confidence and control.
Authors:Nick Hagar, Ethan Silver, Clare Spencer, Nicholas Diakopoulos
Abstract:
Journalists face mounting challenges in monitoring ever-expanding digital information streams to identify newsworthy content. While traditional automation tools gather information at scale, they struggle with the editorial judgment needed to assess newsworthiness. This paper investigates whether large language models (LLMs) can serve as effective first-pass filters for journalistic monitoring. We develop a prompt-based approach encoding journalistic news values - timeliness, impact, controversy, and generalizability - into LLM instructions to extract and evaluate potential story leads. We validate our approach across multiple models against expert-annotated ground truth, then deploy a real-world monitoring pipeline that processes trade press articles daily. Our evaluation reveals strong performance in extracting relevant leads from source material ($F1=0.94$) and in coarse newsworthiness assessment ($\pm$1 accuracy up to 92%), but it consistently struggles with nuanced editorial judgments requiring beat expertise. The system proves most valuable as a hybrid tool combining automated monitoring with human review, successfully surfacing novel, high-value leads while filtering obvious noise. We conclude with practical recommendations for integrating LLM-powered monitoring into newsroom workflows that preserves editorial judgment while extending journalistic capacity.
Authors:Yuanhao Zhang, Wenbo Li, Xiaoyu Wang, Kangyu Yuan, Shuai Ma, Xiaojuan Ma
Abstract:
Asynchronous online discussions enable diverse participants to co-construct knowledge beyond individual contributions. This process ideally evolves through sequential phases, from superficial information exchange to deeper synthesis. However, many discussions stagnate in the early stages. Existing AI interventions typically target isolated phases, lacking mechanisms to progressively advance knowledge co-construction, and the impacts of different intervention styles in this context remain unclear and warrant investigation. To address these gaps, we conducted a design workshop to explore AI intervention strategies (task-oriented and/or relationship-oriented) throughout the knowledge co-construction process, and implemented them in an LLM-powered agent capable of facilitating progression while consolidating foundations at each phase. A within-subject study (N=60) involving five consecutive asynchronous discussions showed that the agent consistently promoted deeper knowledge progression, with different styles exerting distinct effects on both content and experience. These findings provide actionable guidance for designing adaptive AI agents that sustain more constructive online discussions.
Authors:Yuanhao Zhang, Xiaoyu Wang, Jiaxiong Hu, Ziqi Pan, Zhenhui Peng, Xiaojuan Ma
Abstract:
AI-infused systems have demonstrated remarkable capabilities in addressing diverse human needs within online communities. Their widespread adoption has shaped user experiences and community dynamics at scale. However, designing such systems requires a clear understanding of user needs, careful design decisions, and robust evaluation. While research on AI-infused systems for online communities has flourished in recent years, a comprehensive synthesis of this space remains absent. In this work, we present a systematic review of 77 studies, analyzing the systems they propose through three lenses: the challenges they aim to address, their design functionalities, and the evaluation strategies employed. The first two dimensions are organized around four core aspects of community participation: contribution, consumption, mediation, and moderation. Our analysis identifies common design and evaluation patterns, distills key design considerations, and highlights opportunities for future research on AI-infused systems in online communities.
Authors:Chenglin Yu, Yang Yu, Songmiao Wang, Yucheng Wang, Yifan Yang, Jinjia Li, Ming Li, Hongxia Yang
Abstract:
Large Language Model (LLM) agents have demonstrated remarkable capabilities in organizing and executing complex tasks, and many such agents are now widely used in various application scenarios. However, developing these agents requires carefully designed workflows, carefully crafted prompts, and iterative tuning, which requires LLM techniques and domain-specific expertise. These hand-crafted limitations hinder the scalability and cost-effectiveness of LLM agents across a wide range of industries. To address these challenges, we propose \textbf{InfiAgent}, a Pyramid-like DAG-based Multi-Agent Framework that can be applied to \textbf{infi}nite scenarios, which introduces several key innovations: a generalized "agent-as-a-tool" mechanism that automatically decomposes complex agents into hierarchical multi-agent systems; a dual-audit mechanism that ensures the quality and stability of task completion; an agent routing function that enables efficient task-agent matching; and an agent self-evolution mechanism that autonomously restructures the agent DAG based on new tasks, poor performance, or optimization opportunities. Furthermore, InfiAgent's atomic task design supports agent parallelism, significantly improving execution efficiency. This framework evolves into a versatile pyramid-like multi-agent system capable of solving a wide range of problems. Evaluations on multiple benchmarks demonstrate that InfiAgent achieves 9.9\% higher performance compared to ADAS (similar auto-generated agent framework), while a case study of the AI research assistant InfiHelper shows that it generates scientific papers that have received recognition from human reviewers at top-tier IEEE conferences.
Authors:Jiannan Xiang, Yun Zhu, Lei Shu, Maria Wang, Lijun Yu, Gabriel Barcik, James Lyon, Srinivas Sunkara, Jindong Chen
Abstract:
Developing and testing user interfaces (UIs) and training AI agents to interact with them are challenging due to the dynamic and diverse nature of real-world mobile environments. Existing methods often rely on cumbersome physical devices or limited static analysis of screenshots, which hinders scalable testing and the development of intelligent UI agents. We introduce UISim, a novel image-based UI simulator that offers a dynamic and interactive platform for exploring mobile phone environments purely from screen images. Our system employs a two-stage method: given an initial phone screen image and a user action, it first predicts the abstract layout of the next UI state, then synthesizes a new, visually consistent image based on this predicted layout. This approach enables the realistic simulation of UI transitions. UISim provides immediate practical benefits for UI testing, rapid prototyping, and synthetic data generation. Furthermore, its interactive capabilities pave the way for advanced applications, such as UI navigation task planning for AI agents. Our experimental results show that UISim outperforms end-to-end UI generation baselines in generating realistic and coherent subsequent UI states, highlighting its fidelity and potential to streamline UI development and enhance AI agent training.
Authors:Jiajun He, Xiaohan Shi, Cheng-Hung Hu, Jinyi Mi, Xingfeng Li, Tomoki Toda
Abstract:
Multimodal speech emotion recognition (SER) has emerged as pivotal for improving human-machine interaction. Researchers are increasingly leveraging both speech and textual information obtained through automatic speech recognition (ASR) to comprehensively recognize emotional states from speakers. Although this approach reduces reliance on human-annotated text data, ASR errors possibly degrade emotion recognition performance. To address this challenge, in our previous work, we introduced two auxiliary tasks, namely, ASR error detection and ASR error correction, and we proposed a novel multimodal fusion (MF) method for learning modality-specific and modality-invariant representations across different modalities. Building on this foundation, in this paper, we introduce two additional training strategies. First, we propose an adversarial network to enhance the diversity of modality-specific representations. Second, we introduce a label-based contrastive learning strategy to better capture emotional features. We refer to our proposed method as M4SER and validate its superiority over state-of-the-art methods through extensive experiments using IEMOCAP and MELD datasets.
Authors:Ao Qu, Yuxi Wen, Jiayi Zhang, Yunge Wen, Yibo Zhao, Alok Prakash, Andrés F. Salazar-Gómez, Paul Pu Liang, Jinhua Zhao
Abstract:
Classroom observation -- one of the most effective methods for teacher development -- remains limited due to high costs and a shortage of expert coaches. We present ClassMind, an AI-driven classroom observation system that integrates generative AI and multimodal learning to analyze classroom artifacts (e.g., class recordings) and deliver timely, personalized feedback aligned with pedagogical practices. At its core is AVA-Align, an agent framework that analyzes long classroom video recordings to generate temporally precise, best-practice-aligned feedback to support teacher reflection and improvement. Our three-phase study involved participatory co-design with educators, development of a full-stack system, and field testing with teachers at different stages of practice. Teachers highlighted the system's usefulness, ease of use, and novelty, while also raising concerns about privacy and the role of human judgment, motivating deeper exploration of future human--AI coaching partnerships. This work illustrates how multimodal AI can scale expert coaching and advance teacher development.
Authors:Md Rakibul Hasan, Md Zakir Hossain, Aneesh Krishna, Shafin Rahman, Tom Gedeon
Abstract:
When someone claims to be empathic, it does not necessarily mean they are perceived as empathic by the person receiving it. Empathy promotes supportive communication, yet the relationship between listeners' trait and state empathy and speakers' perceptions remains unclear. We conducted an experiment in which speakers described a personal incident and one or more listeners responded naturally, as in everyday conversation. Afterwards, speakers reported perceived empathy, and listeners reported their trait and state empathy. Reliability of the scales was high (Cronbach's $α= 0.805$--$0.888$). Nonparametric Kruskal-Wallis tests showed that speakers paired with higher trait-empathy listeners reported greater perceived empathy, with large effect sizes. In contrast, state empathy did not reliably differentiate speaker outcomes. To complement self-reports, we collected electrodermal activity and heart rate from listeners during the conversations, which shows that high trait empathy listeners exhibited higher physiological variability.
Authors:Xingang Guo, Yaxin Li, Xiangyi Kong, Yilan Jiang, Xiayu Zhao, Zhihua Gong, Yufan Zhang, Daixuan Li, Tianle Sang, Beixiao Zhu, Gregory Jun, Yingbing Huang, Yiqi Liu, Yuqi Xue, Rahul Dev Kundu, Qi Jian Lim, Yizhou Zhao, Luke Alexander Granger, Mohamed Badr Younis, Darioush Keivan, Nippun Sabharwal, Shreyanka Sinha, Prakhar Agarwal, Kojo Vandyck, Hanlin Mai, Zichen Wang, Aditya Venkatesh, Ayush Barik, Jiankun Yang, Chongying Yue, Jingjie He, Libin Wang, Licheng Xu, Hao Chen, Jinwen Wang, Liujun Xu, Rushabh Shetty, Ziheng Guo, Dahui Song, Manvi Jha, Weijie Liang, Weiman Yan, Bryan Zhang, Sahil Bhandary Karnoor, Jialiang Zhang, Rutva Pandya, Xinyi Gong, Mithesh Ballae Ganesh, Feize Shi, Ruiling Xu, Yifan Zhang, Yanfeng Ouyang, Lianhui Qin, Elyse Rosenbaum, Corey Snyder, Peter Seiler, Geir Dullerud, Xiaojia Shelly Zhang, Zuofu Cheng, Pavan Kumar Hanumolu, Jian Huang, Mayank Kulkarni, Mahdi Namazifar, Huan Zhang, Bin Hu
Abstract:
Today, industry pioneers dream of developing general-purpose AI engineers capable of designing and building humanity's most ambitious projects--from starships that will carry us to distant worlds to Dyson spheres that harness stellar energy. Yet engineering design represents a fundamentally different challenge for large language models (LLMs) compared to traditional textbook-style problem solving or factual question answering. Real-world engineering design demands the synthesis of domain knowledge, navigation of complex trade-offs, and management of the tedious processes that consume much of practicing engineers' time. Despite these shared challenges across engineering disciplines, no benchmark currently captures the unique demands of engineering design work. In this work, we introduce ENGDESIGN, an Engineering Design benchmark that evaluates LLMs' abilities to perform practical design tasks across nine engineering domains: Operating System Design, Computer Architecture Design, Control System Design, Mechanical Systems, Structural Design, Digital Hardware Design, Analog Integrated Circuit Design, Robotics, and Signal Processing. Unlike existing benchmarks that focus on factual recall or question answering, ENGDESIGN uniquely emphasizes LLMs' ability to synthesize domain knowledge, reason under constraints, and generate functional, objective-oriented designs. Each task in ENGDESIGN represents a real-world engineering design problem, accompanied by a detailed task description specifying design goals, constraints, and performance requirements. We pioneer a simulation-based evaluation paradigm where LLM-generated designs undergo rigorous testing through executable, domain-specific simulations-from circuit SPICE simulations to structural finite element analysis, from control system validation to robotic motion planning.
Authors:Alice Qian, Ziqi Yang, Ryland Shaw, Jina Suh, Laura Dabbish, Hong Shen
Abstract:
Responsible AI (RAI) content work, such as annotation, moderation, or red teaming for AI safety, often exposes crowd workers to potentially harmful content. While prior work has underscored the importance of communicating well-being risk to employed content moderators, designing effective disclosure mechanisms for crowd workers while balancing worker protection with the needs of task designers and platforms remains largely unexamined. To address this gap, we conducted co-design sessions with 29 task designers, workers, and platform representatives. We investigated task designer preferences for support in disclosing tasks, worker preferences for receiving risk disclosure warnings, and how platform stakeholders envision their role in shaping risk disclosure practices. We identify design tensions and map the sociotechnical tradeoffs that shape disclosure practices. We contribute design recommendations and feature concepts for risk disclosure mechanisms in the context of RAI content work.
Authors:Si Chen, Isabel R. Molnar, Peiyu Li, Adam Acunin, Ting Hua, Alex Ambrose, Nitesh V. Chawla, Ronald Metoyer
Abstract:
Large language models (LLMs) typically generate direct answers, yet they are increasingly used as learning tools. Studying instructors' usage is critical, given their role in teaching and guiding AI adoption in education. We designed and evaluated TeaPT, an LLM for pedagogical purposes that supports instructors' professional development through two conversational approaches: a Socratic approach that uses guided questioning to foster reflection, and a Narrative approach that offers elaborated suggestions to extend externalized cognition. In a mixed-method study with 41 higher-education instructors, the Socratic version elicited greater engagement, while the Narrative version was preferred for actionable guidance. Subgroup analyses further revealed that less-experienced, AI-optimistic instructors favored the Socratic version, whereas more-experienced, AI-cautious instructors preferred the Narrative version. We contribute design implications for LLMs for pedagogical purposes, showing how adaptive conversational approaches can support instructors with varied profiles while highlighting how AI attitudes and experience shape interaction and learning.
Authors:Abdul Rehman, Ilona Heldal, Diana Stilwell, Paula Costa Ferreira, Jerry Chun-Wei Lin
Abstract:
Eye tracking (ET) can help to understand visual attention and cognitive processes in interactive environments. This study presents a comprehensive eye-tracking analysis framework of the Inhibitory Control Game, named the ReStroop game, which is an educational intervention aimed at improving inhibitory control skills in children through a recycling-themed sorting task, for educational assessment that processes raw gaze data through unified algorithms for fixation detection, performance evaluation, and personalized intervention planning. The system employs dual-threshold eye movement detection (I-VT and advanced clustering), comprehensive Area of Interest (AOI) analysis, and evidence-based risk assessment to transform gaze patterns into actionable educational insights. We evaluated this framework across three difficulty levels and revealed critical attention deficits, including low task relevance, elevated attention scatter, and compromised processing efficiency. The multi-dimensional risk assessment identified high to moderate risk levels, triggering personalized interventions including focus training, attention regulation support, and environmental modifications. The system successfully distinguishes between adaptive learning and cognitive overload, providing early warning indicators for educational intervention. Results demonstrate the system's effectiveness in objective attention assessment, early risk identification, and the generation of evidence-based recommendations for students, teachers, and specialists, supporting data-driven educational decision-making and personalized learning approaches.
Authors:Abdul Rehman, Ilona Heldal, Cristina Costescu, Carmen David, Jerry Chun-Wei Lin
Abstract:
This paper introduces an innovative adaptive scoring framework for children with Neurodevelopmental Disorders (NDD) that is attributed to the integration of multiple metrics, such as spatial attention patterns, temporal engagement, and game performance data, to create a comprehensive assessment of learning that goes beyond traditional game scoring. The framework employs a progressive difficulty adaptation method, which focuses on specific stimuli for each level and adjusts weights dynamically to accommodate increasing cognitive load and learning complexity. Additionally, it includes capabilities for temporal analysis, such as detecting engagement periods, providing rewards for sustained attention, and implementing an adaptive multiplier framework based on performance levels. To avoid over-rewarding high performers while maximizing improvement potential for students who are struggling, the designed framework features an adaptive temporal impact framework that adjusts performance scales accordingly. We also established a multi-metric validation framework using Mean Absolute Error (MAE), Root Mean Square Error (RMSE), Pearson correlation, and Spearman correlation, along with defined quality thresholds for assessing deployment readiness in educational settings. This research bridges the gap between technical eye-tracking metrics and educational insights by explicitly mapping attention patterns to learning behaviors, enabling actionable pedagogical interventions.
Authors:Farah Baracat, Agnese Grison, Dario Farina, Giacomo Indiveri, Elisa Donati
Abstract:
Restoring naturalistic finger control in assistive technologies requires the continuous decoding of motor intent with high accuracy, efficiency, and robustness. Here, we present a spike-based decoding framework that integrates spiking neural networks (SNNs) with motor unit activity extracted from high-density intramuscular microelectrode arrays. We demonstrate simultaneous and proportional decoding of individual finger forces from motor unit spike trains during isometric contractions at 15% of maximum voluntary contraction using SNNs. We systematically evaluated alternative SNN decoder configurations and compared two possible input modalities: physiologically grounded motor unit spike trains and spike-encoded intramuscular EMG signals. Through this comparison, we quantified trade-offs between decoding accuracy, memory footprint, and robustness to input errors. The results showed that shallow SNNs can reliably decode finger-level motor intent with competitive accuracy and minimal latency, while operating with reduced memory requirements and without the need for external preprocessing buffers. This work provides a practical blueprint for integrating SNNs into finger-level force decoding systems, demonstrating how the choice of input representation can be strategically tailored to meet application-specific requirements for accuracy, robustness, and memory efficiency.
Authors:Yeaeun Gong, Yifan Liu, Lanyu Shang, Na Wei, Dong Wang
Abstract:
In this paper, we study the problem of AI explanation of misinformation, where the goal is to identify explanation designs that help improve users' misinformation detection abilities and their overall user experiences. Our work is motivated by the limitations of current Explainable AI (XAI) approaches, which predominantly focus on content explanations that elucidate the linguistic features and sentence structures of the misinformation. To address this limitation, we explore various explanations beyond content explanation, such as "social explanation" that considers the broader social context surrounding misinformation, as well as a "combined explanation" where both the content and social explanations are presented in scenarios that are either aligned or misaligned with each other. To evaluate the comparative effectiveness of these AI explanations, we conduct two online crowdsourcing experiments in the COVID-19 (Study 1 on Prolific) and Politics domains (Study 2 on MTurk). Our results show that AI explanations are generally effective in aiding users to detect misinformation, with effectiveness significantly influenced by the alignment between content and social explanations. We also find that the order in which explanation types are presented - specifically, whether a content or social explanation comes first - can influence detection accuracy, with differences found between the COVID-19 and Political domains. This work contributes towards more effective design of AI explanations, fostering a deeper understanding of how different explanation types and their combinations influence misinformation detection.
Authors:Evan G. Center, Matti Pouke, Alessandro Nardi, Lukas Gehrke, Klaus Gramann, Timo Ojala, Steven M. LaValle
Abstract:
Presence in virtual reality (VR), the subjective sense of "being there" in a virtual environment, is notoriously difficult to measure. Electroencephalography (EEG) may offer a promising, unobtrusive means of assessing a user's momentary state of presence. Unlike traditional questionnaires, EEG does not interrupt the experience or rely on users' retrospective self-reports, thereby avoiding interference with the very state it aims to capture. Previous research has attempted to quantify presence in virtual environments using event-related potentials (ERPs). We contend, however, that previous efforts have fallen short of fully realizing this goal, failing to either A) independently manipulate presence, B) validate their measure of presence against traditional techniques, C) adequately separate the constructs of presence and attention, and/or D) implement a realistic and immersive environment and task. We address these shortcomings in a preregistered ERP experiment in which participants play an engaging target shooting game in VR. ERPs are time-locked to the release of a ball from a sling. We induce breaks in presence (BIPs) by freezing the ball's release on a minority of trials. Embodiment is manipulated by allowing manual manipulation of the sling with a realistic avatar in one condition (embodied condition) and passive manipulation with only controllers in another (non-embodied condition). We support our predictions that the N2, the P3b, and the N400, are selectively sensitive towards specific components of these manipulations. The pattern of findings carries significant implications for theories of presence, which have been seldom addressed in previous ERP investigations on this topic.
Authors:Abdul Rehman, Ilona Heldal, Jerry Chun-Wei Lin
Abstract:
Eye Tracking (ET) can help to understand visual attention and cognitive processes in interactive environments. In attention tasks, distinguishing between relevant target objects and distractors is crucial for effective performance, yet the underlying gaze patterns that drive successful task completion remain incompletely understood. Traditional gaze analyses lack comprehensive insights into the temporal dynamics of attention allocation and the relationship between gaze behavior and task performance. When applied to complex visual search scenarios, current gaze analysis methods face several limitations, including the isolation of measurements, visual stability, search efficiency, and the decision-making processes involved in these scenarios. This paper proposes an analysis tool that considers time series for eye tracking data from task performance and also gaze measures (fixations, saccades and smooth pursuit); temporal pattern analysis that reveals how attention evolves throughout task performance; object-click sequence tracking that directly links visual attention to user actions; and performance metrics that quantify both accuracy and efficiency. This tool provides comprehensive visualization techniques that make complex patterns of stimuli and gaze connections interpretable.
Authors:Liuqing Chen, Yaxuan Song, Ke Lyu, Shuhong Xiao, Yilang Shen, Lingyun Sun
Abstract:
Car-riding is common for children in modern life. Given the repetitive nature of daily commutes, they often feel bored and turn to electronic devices for entertainment. Meanwhile, the rich and dynamic scenery outside the car naturally attracts children's curiosity and offers valuable resources for cognitive development. Our formative study reveals that parents' support during car rides is often fleeting, as accompanying adults may struggle to consistently guide children's exploration. To address this, we propose SCENIC, an interactive system that helps children aged 6 to 11 better perceive the external environment using location-based cognitive development strategies. SCENIC builds upon experiential approaches used by parents, resulting in six strategies embedded into the system. To improve engagement during routine rides, SCENIC also incorporates dynamic point-of-interest selection and journey gallery generation. We evaluated the generated content (N=21) and conducted an in-situ user study with seven families and ten children. Results suggest that SCENIC enhances the car-riding experience and helps children better connect with their surroundings.
Authors:Eetu Laukka, Evan G. Center, Timo Ojala, Steven M. LaValle, Matti Pouke
Abstract:
Mobile telepresence robots allow users to feel present and explore remote environments using technology. Traditionally, these systems are implemented using a camera onboard a mobile robot that can be controlled. Although high-immersion technologies, such as 360-degree cameras, can increase situational awareness and presence, they also introduce significant challenges. Additional processing and bandwidth requirements often result in latencies of up to seconds. The current delay with a 360-degree camera streaming over the internet makes real-time control of these systems difficult. Working with high-latency systems requires some form of assistance to the users.
This study presents a novel way to utilize optical flow to create an illusion of self-motion to the user during the latency period between user sending motion commands to the robot and seeing the actual motion through the 360-camera stream. We find no significant benefit of using the self-motion illusion to performance or accuracy of controlling a telepresence robot with a latency of 500 ms, as measured by the task completion time and collisions into objects. Some evidence is shown that the method might increase virtual reality (VR) sickness, as measured by the simulator sickness questionnaire (SSQ). We conclude that further adjustments are necessary in order to render the method viable.
Authors:Anaïs Halin, Adrien Deliège, Christel Devue, Marc Van Droogenbroeck
Abstract:
In this simulator study, we investigate how gaze parameters reflect driver cognitive distraction under varying traffic conditions and adaptive cruise control (ACC) use. Participants completed six driving scenarios that combined two levels of cognitive distraction (with/without mental calculations) and three levels of driving environment complexity. Throughout the experiment, participants were free to activate or deactivate an ACC. We analyzed two gaze-based indicators of driver cognitive distraction: the percent road center, and the gaze dispersions (horizontal and vertical). Our results show that vertical gaze dispersion increases with traffic complexity, while ACC use leads to gaze concentration toward the road center. Cognitive distraction reduces road center gaze and increases vertical dispersion. Complementary analyses revealed that these observations actually arise mainly between mental calculations, while periods of mental calculations are characterized by a temporary increase in gaze concentration.
Authors:Anaïs Halin, Marc Van Droogenbroeck, Christel Devue
Abstract:
In this simulator study, we investigate whether and how electrodermal activity (EDA) reflects driver cognitive distraction under varying traffic conditions and adaptive cruise control (ACC) use. Participants drove in six scenarios, combining two levels of cognitive distraction (presence/absence of a mental calculation task) and three levels of driving environment complexity (different traffic conditions). Throughout the experiment, they were free to activate or deactivate ACC (ACC use, two levels). We analyzed three EDA-based indicators of cognitive distraction: SCL (mean skin conductance level), SCR amplitude (mean amplitude of skin conductance responses), and SCR rate (rate of skin conductance responses). Results indicate that all three indicators were significantly influenced by cognitive distraction and ACC use, while environment complexity influenced SCL and SCR amplitude, but not SCR rate. These findings suggest that EDA-based indicators reflect variations in drivers' mental workload due not only to cognitive distraction, but also to driving environment and automation use.
Authors:Anaïs Halin, Christel Devue, Marc Van Droogenbroeck
Abstract:
The increasing integration of automation in vehicles aims to enhance both safety and comfort, but it also introduces new risks, including driver disengagement, reduced situation awareness, and mode confusion. In this work, we propose the DEV framework, a closed-loop framework for risk-aware adaptive driving automation that captures the dynamic interplay between the driver, the environment, and the vehicle. The framework promotes to continuously adjusting the operational level of automation based on a risk management strategy. The real-time risk assessment supports smoother transitions and effective cooperation between the driver and the automation system. Furthermore, we introduce a nomenclature of indexes corresponding to each core component, namely driver involvement, environment complexity, and vehicle engagement, and discuss how their interaction influences driving risk. The DEV framework offers a comprehensive perspective to align multidisciplinary research efforts and guide the development of dynamic, risk-aware driving automation systems.
Authors:Ziqi Pan, Runhua Zhang, Jiehui Luo, Yuanhao Zhang, Yue Deng, Xiaojuan Ma
Abstract:
Hashtags serve as identity markers and connection tools in online queer communities. Recently, the Western-origin #wlw (women-loving-women) hashtag has risen in the Chinese lesbian community on RedNote, coinciding with user migration triggered by the temporary US TikTok ban. This event provides a unique lens to study cross-cultural hashtag ingress and diffusion through the populations' responsive behaviors in cyber-migration. In this paper, we conducted a two-phase content analysis of 418 #wlw posts from January and April, examining different usage patterns during the hashtag's ingress and diffusion. Results indicate that the successful introduction of #wlw was facilitated by TikTok immigrants' bold importation, both populations' mutual interpretation, and RedNote natives' discussions. In current manifestation of diffusion, #wlw becomes a RedNote-recognized queer hashtag for sharing queer life, and semantically expands to support feminism discourse. Our findings provide empirical insights for enhancing the marginalized communities' cross-cultural communication.
Authors:Xiaoye Zuo, Nikos Athanasiou, Ginger Delmas, Yiming Huang, Xingyu Fu, Lingjie Liu
Abstract:
Good form is the difference between strength and strain, yet for the fast-growing community of at-home fitness enthusiasts, expert feedback is often out of reach. FormCoach transforms a simple camera into an always-on, interactive AI training partner, capable of spotting subtle form errors and delivering tailored corrections in real time, leveraging vision-language models (VLMs). We showcase this capability through a web interface and benchmark state-of-the-art VLMs on a dataset of 1,700 expert-annotated user-reference video pairs spanning 22 strength and mobility exercises. To accelerate research in AI-driven coaching, we release both the dataset and an automated, rubric-based evaluation pipeline, enabling standardized comparison across models. Our benchmarks reveal substantial gaps compared to human-level coaching, underscoring both the challenges and opportunities in integrating nuanced, context-aware movement analysis into interactive AI systems. By framing form correction as a collaborative and creative process between humans and machines, FormCoach opens a new frontier in embodied AI.
Authors:Tram Thi Minh Tran, Xinyan Yu, Callum Parker, Julie Stephany Berrio Perez, Stewart Worrall, Martin Tomitsch
Abstract:
Pedestrian gestures play an important role in traffic communication, particularly in interactions with autonomous vehicles (AVs), yet their subtle, ambiguous, and context-dependent nature poses persistent challenges for machine interpretation. This study investigates these challenges by using GPT-4V, a vision-language model, not as a performance benchmark but as a diagnostic tool to reveal patterns and causes of gesture misrecognition. We analysed a public dataset of pedestrian-vehicle interactions, combining manual video review with thematic analysis of the model's qualitative reasoning. This dual approach surfaced recurring factors influencing misrecognition, including gesture visibility, pedestrian behaviour, interaction context, and environmental conditions. The findings suggest practical considerations for gesture design, including the value of salience and contextual redundancy, and highlight opportunities to improve AV recognition systems through richer context modelling and uncertainty-aware interpretations. While centred on AV-pedestrian interaction, the method and insights are applicable to other domains where machines interpret human gestures, such as wearable AR and assistive technologies.
Authors:Daniel Lee, Nikhil Sharma, Donghoon Shin, DaEun Choi, Harsh Sharma, Jeonghwan Kim, Heng Ji
Abstract:
Generative AI has made image creation more accessible, yet aligning outputs with nuanced creative intent remains challenging, particularly for non-experts. Existing tools often require users to externalize ideas through prompts or references, limiting fluid exploration. We introduce ThematicPlane, a system that enables users to navigate and manipulate high-level semantic concepts (e.g., mood, style, or narrative tone) within an interactive thematic design plane. This interface bridges the gap between tacit creative intent and system control. In our exploratory study (N=6), participants engaged in divergent and convergent creative modes, often embracing unexpected results as inspiration or iteration cues. While they grounded their exploration in familiar themes, differing expectations of how themes mapped to outputs revealed a need for more explainable controls. Overall, ThematicPlane fosters expressive, iterative workflows and highlights new directions for intuitive, semantics-driven interaction in generative design tools.
Authors:Ningzhi Tang, Emory Smith, Yu Huang, Collin McMillan, Toby Jia-Jun Li
Abstract:
This paper presents a study of using large language models (LLMs) in modifying existing code. While LLMs for generating code have been widely studied, their role in code modification remains less understood. Although "prompting" serves as the primary interface for developers to communicate intents to LLMs, constructing effective prompts for code modification introduces challenges different from generation. Prior work suggests that natural language summaries may help scaffold this process, yet such approaches have been validated primarily in narrow domains like SQL rewriting. This study investigates two prompting strategies for LLM-assisted code modification: Direct Instruction Prompting, where developers describe changes explicitly in free-form language, and Summary-Mediated Prompting, where changes are made by editing the generated summaries of the code. We conducted an exploratory study with 15 developers who completed modification tasks using both techniques across multiple scenarios. Our findings suggest that developers followed an iterative workflow: understanding the code, localizing the edit, and validating outputs through execution or semantic reasoning. Each prompting strategy presented trade-offs: direct instruction prompting was more flexible and easier to specify, while summary-mediated prompting supported comprehension, prompt scaffolding, and control. Developers' choice of strategy was shaped by task goals and context, including urgency, maintainability, learning intent, and code familiarity. These findings highlight the need for more usable prompt interactions, including adjustable summary granularity, reliable summary-code traceability, and consistency in generated summaries.
Authors:Chrysovalanto Messiou, Riender Happee, Georgios Papaioannou
Abstract:
Automated vehicles will allow occupants to engage in non-driving tasks, but limited visual cues will make them vulnerable to unexpected movements. These unpredictable perturbations create a "surprise factor," forcing the central nervous system to rely on compensatory postural adjustments, which are less effective, and are more likely to trigger sensory conflicts. Since the head is a key reference for sensory input (vestibular and vision), models accurately capturing head-neck postural stabilization are essential for assessing AV comfort. This study extends an existing model predictive control-based framework to simulate head-neck postural control under lateral perturbations. Experimental validation against human data demonstrates that the model can accurately reproduce dynamic responses during lateral trunk perturbations. The results show that muscle effort combined with partial somatosensory feedback provides the best overall dynamic fit without requiring corrective relative and global head orientation integrators for posture.
Authors:Qiaoqiao Ren, Tony Belpaeme
Abstract:
As large language models (LLMs) become increasingly integrated into robotic systems, their potential to generate socially and culturally appropriate affective touch remains largely unexplored. This study investigates whether LLMs-specifically GPT-3.5, GPT-4, and GPT-4o --can generate culturally adaptive tactile behaviours to convey emotions in human-robot interaction. We produced text based touch descriptions for 12 distinct emotions across three cultural contexts (Chinese, Belgian, and unspecified), and examined their interpretability in both robot-to-human and human-to-robot scenarios. A total of 90 participants (36 Chinese, 36 Belgian, and 18 culturally unspecified) evaluated these LLM-generated tactile behaviours for emotional decoding and perceived appropriateness. Results reveal that: (1) under matched cultural conditions, participants successfully decoded six out of twelve emotions-mainly socially oriented emotions such as love and Ekman emotions such as anger, however, self-focused emotions like pride and embarrassment were more difficult to interpret; (2) tactile behaviours were perceived as more appropriate when directed from human to robot than from robot to human, revealing an asymmetry in social expectations based on interaction roles; (3) behaviours interpreted as aggressive (e.g., anger), overly intimate (e.g., love), or emotionally ambiguous (i.e., not clearly decodable) were significantly more likely to be rated as inappropriate; and (4) cultural mismatches reduced decoding accuracy and increased the likelihood of behaviours being judged as inappropriate.
Authors:David James Woo, Yangyang Yu, Kai Guo, Yilin Huang, April Ka Yeng Fung
Abstract:
Text generated by artificial intelligence (AI) chatbots is increasingly used in English as a foreign language (EFL) writing contexts, yet its impact on students' expository writing process and compositions remains understudied. This research examines how EFL secondary students edit AI-generated text. Exploring editing behaviors in their expository writing process and in expository compositions, and their effect on human-rated scores for content, organization, language, and overall quality. Participants were 39 Hong Kong secondary students who wrote an expository composition with AI chatbots in a workshop. A convergent design was employed to analyze their screen recordings and compositions to examine students' editing behaviors and writing qualities. Analytical methods included qualitative coding, descriptive statistics, temporal sequence analysis, human-rated scoring, and multiple linear regression analysis. We analyzed over 260 edits per dataset, and identified two editing patterns: one where students refined introductory units repeatedly before progressing, and another where they quickly shifted to extensive edits in body units (e.g., topic and supporting sentences). MLR analyses revealed that the number of AI-generated words positively predicted all score dimensions, while most editing variables showed minimal impact. These results suggest a disconnect between students' significant editing effort and improved composition quality, indicating AI supports but does not replace writing skills. The findings highlight the importance of genre-specific instruction and process-focused writing before AI integration. Educators should also develop assessments valuing both process and product to encourage critical engagement with AI text.
Authors:Tram Thi Minh Tran, Xinyan Yu, Marius Hoggenmueller, Callum Parker, Paul Schmitt, Julie Stephany Berrio Perez, Stewart Worrall, Martin Tomitsch
Abstract:
Autonomous mobility systems increasingly operate in environments shared with animals, from urban pets to wildlife. However, their design has largely focused on human interaction, with limited understanding of how non-human species perceive, respond to, or are affected by these systems. Motivated by research in Animal-Computer Interaction (ACI) and more-than-human design, this study investigates animal interactions with autonomous mobility through a multi-method approach combining a scoping review (45 articles), online ethnography (39 YouTube videos and 11 Reddit discussions), and expert interviews (8 participants). Our analysis surfaces five key areas of concern: Physical Impact (e.g., collisions, failures to detect), Behavioural Effects (e.g., avoidance, stress), Accessibility Concerns (particularly for service animals), Ethics and Regulations, and Urban Disturbance. We conclude with design and policy directions aimed at supporting multispecies coexistence in the age of autonomous systems. This work underscores the importance of incorporating non-human perspectives to ensure safer, more inclusive futures for all species.
Authors:Jiahua Tang, Song Wang, Jiachen Zou, Chen Wei, Quanying Liu
Abstract:
Understanding how the human brain encodes and processes external visual stimuli has been a fundamental challenge in neuroscience. With advancements in artificial intelligence, sophisticated visual decoding architectures have achieved remarkable success in fMRI research, enabling more precise and fine-grained spatial concept localization. This has provided new tools for exploring the spatial representation of concepts in the brain. However, despite the millisecond-scale temporal resolution of EEG, which offers unparalleled advantages in tracking the dynamic evolution of cognitive processes, the temporal dynamics of neural representations based on EEG remain underexplored. This is primarily due to EEG's inherently low signal-to-noise ratio and its complex spatiotemporal coupling characteristics. To bridge this research gap, we propose a novel approach that integrates advanced neural decoding algorithms to systematically investigate how low-dimensional object properties are temporally encoded in EEG signals. We are the first to attempt to identify the specificity and prototypical temporal characteristics of concepts within temporal distributions. Our framework not only enhances the interpretability of neural representations but also provides new insights into visual decoding in brain-computer interfaces (BCI).
Authors:Anaïs Halin, Marc Van Droogenbroeck, Christel Devue
Abstract:
In this simulator study, we adopt a human-centered approach to explore whether and how drivers' cognitive state and driving environment complexity influence reliance on driving automation features. Besides, we examine whether such reliance affects driving performance. Participants operated a vehicle equipped with adaptive cruise control (ACC) in a simulator across six predefined driving scenarios varying in traffic conditions while either performing a cognitively demanding task (i.e., responding to mental calculations) or not. Throughout the experiment, participants had to respect speed limits and were free to activate or deactivate ACC. In complex driving environments, we found that the overall ACC engagement time was lower compared to less complex driving environments. We observed no significant effect of cognitive load on ACC use. Furthermore, while ACC use had no effect on the number of lane changes, it impacted the speed limits compliance and improved lateral control.
Authors:Santiago de Leon-Martinez, Robert Moro, Branislav Kveton, Maria Bielikova
Abstract:
Carousels have become the de-facto interface in online services. However, there is a lack of research in carousels, particularly examining how recommender systems may be designed differently than the traditional single-list interfaces. One of the key elements for understanding how to design a system for a particular interface is understanding how users browse. For carousels, users may browse in a number of different ways due to the added complexity of multiple topic defined-lists and swiping to see more items.
Eye tracking is the key to understanding user behavior by providing valuable, direct information on how users see and navigate. In this work, we provide the first extensive analysis of the eye tracking behavior in carousel recommenders under the free-browsing setting. To understand how users browse, we examine the following research questions : 1) where do users start browsing, 2) how do users transition from item to item within the same carousel and across carousels, and 3) how does genre preference impact transitions?
This work addresses a gap in the field and provides the first extensive empirical results of eye tracked browsing behavior in carousels for improving recommenders. Taking into account the insights learned from the above questions, our final contribution is to provide suggestions to help carousel recommender system designers optimize their systems for user browsing behavior. The most important suggestion being to reorder the ranked item positions to account for browsing after swiping.These contributions aim not only to help improve current systems, but also to encourage and allow the design of new user models, systems, and metrics that are better suited to the complexity of carousel interfaces.
Authors:Hang Wang, Junshan Zhang
Abstract:
Multi-agent reinforcement learning faces fundamental challenges that conventional approaches have failed to overcome: exponentially growing joint action spaces, non-stationary environments where simultaneous learning creates moving targets, and partial observability that constrains coordination. Current methods remain reactive, employing stimulus-response mechanisms that fail when facing novel scenarios. We argue for a transformative paradigm shift from reactive to proactive multi-agent intelligence through generative AI-based reinforcement learning. This position advocates reconceptualizing agents not as isolated policy optimizers, but as sophisticated generative models capable of synthesizing complex multi-agent dynamics and making anticipatory decisions based on predictive understanding of future interactions. Rather than responding to immediate observations, generative-RL agents can model environment evolution, predict other agents' behaviors, generate coordinated action sequences, and engage in strategic reasoning accounting for long-term dynamics. This approach leverages pattern recognition and generation capabilities of generative AI to enable proactive decision-making, seamless coordination through enhanced communication, and dynamic adaptation to evolving scenarios. We envision this paradigm shift will unlock unprecedented possibilities for distributed intelligence, moving beyond individual optimization toward emergent collective behaviors representing genuine collaborative intelligence. The implications extend across autonomous systems, robotics, and human-AI collaboration, promising solutions to coordination challenges intractable under traditional reactive frameworks.
Authors:Maximilian Zipfl, Pascal Zwick, Patrick Schulz, Marc Rene Zofka, Albert Schotschneider, Helen Gremmelmaier, Nikolai Polley, Ferdinand Mütsch, Kevin Simon, Fabian Gottselig, Michael Frey, Sergio Marschall, Akim Stark, Maximilian Müller, Marek Wehmer, Mihai Kocsis, Dominic Waldenmayer, Florian Schnepf, Erik Heinrich, Sabrina Pletz, Matthias Kölle, Karin Langbein-Euchner, Alexander Viehl, Raoul Zöllner, J. Marius Zöllner
Abstract:
In the future, mobility will be strongly shaped by the increasing use of digitalization. Not only will individual road users be highly interconnected, but also the road and associated infrastructure. At that point, a Digital Twin becomes particularly appealing because, unlike a basic simulation, it offers a continuous, bilateral connection linking the real and virtual environments. This paper describes the digital reconstruction used to develop the Digital Twin of the Test Area Autonomous Driving-Baden-Württemberg (TAF-BW), Germany. The TAF-BW offers a variety of different road sections, from high-traffic urban intersections and tunnels to multilane motorways. The test area is equipped with a comprehensive Vehicle-to-Everything (V2X) communication infrastructure and multiple intelligent intersections equipped with camera sensors to facilitate real-time traffic flow monitoring. The generation of authentic data as input for the Digital Twin was achieved by extracting object lists at the intersections. This process was facilitated by the combined utilization of camera images from the intelligent infrastructure and LiDAR sensors mounted on a test vehicle. Using a unified interface, recordings from real-world detections of traffic participants can be resimulated. Additionally, the simulation framework's design and the reconstruction process is discussed. The resulting framework is made publicly available for download and utilization at: https://digit4taf-bw.fzi.de The demonstration uses two case studies to illustrate the application of the digital twin and its interfaces: the analysis of traffic signal systems to optimize traffic flow and the simulation of security-related scenarios in the communications sector.
Authors:Zahra Ashktorab, Elizabeth M. Daly, Erik Miehling, Werner Geyer, Martin Santillan Cooper, Tejaswini Pedapati, Michael Desmond, Qian Pan, Hyo Jin Do
Abstract:
With the broad availability of large language models and their ability to generate vast outputs using varied prompts and configurations, determining the best output for a given task requires an intensive evaluation process, one where machine learning practitioners must decide how to assess the outputs and then carefully carry out the evaluation. This process is both time-consuming and costly. As practitioners work with an increasing number of models, they must now evaluate outputs to determine which model and prompt performs best for a given task. LLMs are increasingly used as evaluators to filter training data, evaluate model performance, assess harms and risks, or assist human evaluators with detailed assessments. We present EvalAssist, a framework that simplifies the LLM-as-a-judge workflow. The system provides an online criteria development environment, where users can interactively build, test, and share custom evaluation criteria in a structured and portable format. We support a set of LLM-based evaluation pipelines that leverage off-the-shelf LLMs and use a prompt-chaining approach we developed and contributed to the UNITXT open-source library. Additionally, our system also includes specially trained evaluators to detect harms and risks in LLM outputs. We have deployed the system internally in our organization with several hundreds of users.
Authors:Deepak Varuvel Dennison, Bakhtawar Ahtisham, Kavyansh Chourasia, Nirmit Arora, Rahul Singh, Rene F. Kizilcec, Akshay Nambi, Tanuja Ganu, Aditya Vashistha
Abstract:
This study investigates Shiksha copilot, an AI-assisted lesson planning tool deployed in government schools across Karnataka, India. The system combined LLMs and human expertise through a structured process in which English and Kannada lesson plans were co-created by curators and AI; teachers then further customized these curated plans for their classrooms using their own expertise alongside AI support. Drawing on a large-scale mixed-methods study involving 1,043 teachers and 23 curators, we examine how educators collaborate with AI to generate context-sensitive lesson plans, assess the quality of AI-generated content, and analyze shifts in teaching practices within multilingual, low-resource environments. Our findings show that teachers used Shiksha copilot both to meet administrative documentation needs and to support their teaching. The tool eased bureaucratic workload, reduced lesson planning time, and lowered teaching-related stress, while promoting a shift toward activity-based pedagogy. However, systemic challenges such as staffing shortages and administrative demands constrained broader pedagogical change. We frame these findings through the lenses of teacher-AI collaboration and communities of practice to examine the effective integration of AI tools in teaching. Finally, we propose design directions for future teacher-centered EdTech, particularly in multilingual and Global South contexts.
Authors:Lucas Joos, Daniel A. Keim, Maximilian T. Fischer
Abstract:
The creation of systematic literature reviews (SLR) is critical for analyzing the landscape of a research field and guiding future research directions. However, retrieving and filtering the literature corpus for an SLR is highly time-consuming and requires extensive manual effort, as keyword-based searches in digital libraries often return numerous irrelevant publications. In this work, we propose a pipeline leveraging multiple large language models (LLMs), classifying papers based on descriptive prompts and deciding jointly using a consensus scheme. The entire process is human-supervised and interactively controlled via our open-source visual analytics web interface, LLMSurver, which enables real-time inspection and modification of model outputs. We evaluate our approach using ground-truth data from a recent SLR comprising over 8,000 candidate papers, benchmarking both open and commercial state-of-the-art LLMs from mid-2024 and fall 2025. Results demonstrate that our pipeline significantly reduces manual effort while achieving lower error rates than single human annotators. Furthermore, modern open-source models prove sufficient for this task, making the method accessible and cost-effective. Overall, our work demonstrates how responsible human-AI collaboration can accelerate and enhance systematic literature reviews within academic workflows.
Authors:Shannon Liu, Maria Teresa Parreira, Wendy Ju
Abstract:
As robots become more integrated into society, detecting robot errors is essential for effective human-robot interaction (HRI). When a robot fails repeatedly, how can it know when to change its behavior? Humans naturally respond to robot errors through verbal and nonverbal cues that intensify over successive failures-from confusion and subtle speech changes to visible frustration and impatience. While prior work shows that human reactions can indicate robot failures, few studies examine how these evolving responses reveal successive failures. This research uses machine learning to recognize stages of robot failure from human reactions. In a study with 26 participants interacting with a robot that made repeated conversational errors, behavioral features were extracted from video data to train models for individual users. The best model achieved 93.5% accuracy for detecting errors and 84.1% for classifying successive failures. Modeling the progression of human reactions enhances error detection and understanding of repeated interaction breakdowns in HRI.
Authors:Jiugeng Sun, Rita Sevastjanova, Sina Ahmadi, Rico Sennrich, Mennatallah El-Assady
Abstract:
Dialects suffer from the scarcity of computational textual resources as they exist predominantly in spoken rather than written form and exhibit remarkable geographical diversity. Collecting dialect data and subsequently integrating it into current language technologies present significant obstacles. Gamification has been proven to facilitate remote data collection processes with great ease and on a substantially wider scale. This paper introduces Dia-Lingle, a gamified interface aimed to improve and facilitate dialectal data collection tasks such as corpus expansion and dialect labelling. The platform features two key components: the first challenges users to rewrite sentences in their dialects, identifies them through a classifier and solicits feedback, and the other one asks users to match sentences to their geographical locations. Dia-Lingle combines active learning with gamified difficulty levels, strategically encouraging prolonged user engagement while efficiently enriching the dialect corpus. Usability evaluation shows that our interface demonstrates high levels of user satisfaction. We provide the link to Dia-Lingle: https://dia-lingle.ivia.ch/, and demo video: https://youtu.be/0QyJsB8ym64.
Authors:Smita Khapre, Melkamu Abay Mersha, Hassan Shakil, Jonali Baruah, Jugal Kalita
Abstract:
The evolution of digital communication systems and the designs of online platforms have inadvertently facilitated the subconscious propagation of toxic behavior. Giving rise to reactive responses to toxic behavior. Toxicity in online content and Artificial Intelligence Systems has become a serious challenge to individual and collective well-being around the world. It is more detrimental to society than we realize. Toxicity, expressed in language, image, and video, can be interpreted in various ways depending on the context of usage. Therefore, a comprehensive taxonomy is crucial to detect and mitigate toxicity in online content, Artificial Intelligence systems, and/or Large Language Models in a proactive manner. A comprehensive understanding of toxicity is likely to facilitate the design of practical solutions for toxicity detection and mitigation. The classification in published literature has focused on only a limited number of aspects of this very complex issue, with a pattern of reactive strategies in response to toxicity. This survey attempts to generate a comprehensive taxonomy of toxicity from various perspectives. It presents a holistic approach to explain the toxicity by understanding the context and environment that society is facing in the Artificial Intelligence era. This survey summarizes the toxicity-related datasets and research on toxicity detection and mitigation for Large Language Models, social media platforms, and other online platforms, detailing their attributes in textual mode, focused on the English language. Finally, we suggest the research gaps in toxicity mitigation based on datasets, mitigation strategies, Large Language Models, adaptability, explainability, and evaluation.
Authors:Xin Xiao, Kaiwen Wei, Jiang Zhong, Dongshuo Yin, Yu Tian, Xuekai Wei, Mingliang Zhou
Abstract:
Understanding the similarity between large language models (LLMs) and human brain activity is crucial for advancing both AI and cognitive neuroscience. In this study, we provide a multilinguistic, large-scale assessment of this similarity by systematically comparing 16 publicly available pretrained LLMs with human brain responses during natural language processing tasks in both English and Chinese. Specifically, we use ridge regression to assess the representational similarity between LLM embeddings and electroencephalography (EEG) signals, and analyze the similarity between the "neural trajectory" and the "LLM latent trajectory." This method captures key dynamic patterns, such as magnitude, angle, uncertainty, and confidence. Our findings highlight both similarities and crucial differences in processing strategies: (1) We show that middle-to-high layers of LLMs are central to semantic integration and correspond to the N400 component observed in EEG; (2) The brain exhibits continuous and iterative processing during reading, whereas LLMs often show discrete, stage-end bursts of activity, which suggests a stark contrast in their real-time semantic processing dynamics. This study could offer new insights into LLMs and neural processing, and also establish a critical framework for future investigations into the alignment between artificial intelligence and biological intelligence.
Authors:Qianou Ma, Kenneth Koedinger, Tongshuang Wu
Abstract:
LLMs promise to democratize technical work in complex domains like programmatic data analysis, but not everyone benefits equally. We study how students with varied expertise use LLMs to complete Python-based data analysis in computational notebooks in a non-major course. Drawing on homework logs, recordings, and surveys from 36 students, we ask: Which expertise matters most, and how does it shape AI use? Our mixed-methods analysis shows that technical expertise -- not AI familiarity or communication skills -- remains a significant predictor of success. Students also vary widely in how they leverage LLMs, struggling at stages of forming intent, expressing inputs, interpreting outputs, and assessing results. We identify success and failure behaviors, such as providing context or decomposing prompts, that distinguish effective use. These findings inform AI literacy interventions, highlighting that lightweight demonstrations improve surface fluency but are insufficient; deeper training and scaffolds are needed to cultivate resilient AI use skills.
Authors:Yaqing Yang, Vikram Mohanty, Yan-Ying Chen, Matthew K. Hong, Nikolas Martelaro, Aniket Kittur
Abstract:
Effective ideation requires both broad exploration of diverse ideas and deep evaluation of their potential. Generative AI can support such processes, but current tools typically emphasize either generating many ideas or supporting in-depth consideration of a few, lacking support for both. Research also highlights risks of over-reliance on LLMs, including shallow exploration and negative creative outcomes. We present FlexMind, an AI-augmented system that scaffolds iterative exploration of ideas, tradeoffs, and mitigations. FlexMind exposes users to a broad set of ideas while enabling a lightweight transition into deeper engagement. In a study comparing ideation with FlexMind to ChatGPT, participants generated higher-quality ideas with FlexMind, due to both broader exposure and deeper engagement with tradeoffs. By scaffolding ideation across breadth, depth, and reflective evaluation, FlexMind empowers users to surface ideas that might otherwise go unnoticed or be prematurely discarded.
Authors:Vasilis Andritsoudis, Pavlos Sermpezis, Ilias Dimitriadis, Athena Vakali
Abstract:
The Internet Yellow Pages (IYP) aggregates information from multiple sources about Internet routing into a unified, graph-based knowledge base. However, querying it requires knowledge of the Cypher language and the exact IYP schema, thus limiting usability for non-experts. In this paper, we propose ChatIYP, a domain-specific Retrieval-Augmented Generation (RAG) system that enables users to query IYP through natural language questions. Our evaluation demonstrates solid performance on simple queries, as well as directions for improvement, and provides insights for selecting evaluation metrics that are better fit for IYP querying AI agents.
Authors:Jason Wu, Amanda Swearngin, Arun Krishna Vajjala, Alan Leung, Jeffrey Nichols, Titus Barik
Abstract:
Despite being trained on vast amounts of data, most LLMs are unable to reliably generate well-designed UIs. Designer feedback is essential to improving performance on UI generation; however, we find that existing RLHF methods based on ratings or rankings are not well-aligned with designers' workflows and ignore the rich rationale used to critique and improve UI designs. In this paper, we investigate several approaches for designers to give feedback to UI generation models, using familiar interactions such as commenting, sketching and direct manipulation. We first perform a study with 21 designers where they gave feedback using these interactions, which resulted in ~1500 design annotations. We then use this data to finetune a series of LLMs to generate higher quality UIs. Finally, we evaluate these models with human judges, and we find that our designer-aligned approaches outperform models trained with traditional ranking feedback and all tested baselines, including GPT-5.
Authors:Mengisti Berihu Girmay, Jakob Droste, Hannah Deters, Joerg Doerr
Abstract:
Artificial Intelligence (AI) promises new opportunities across many domains, including agriculture. However, the adoption of AI systems in this sector faces several challenges. System complexity can impede trust, as farmers' livelihoods depend on their decision-making and they may reject opaque or hard-to-understand recommendations. Data privacy concerns also pose a barrier, especially when farmers lack transparency regarding who can access their data and for what purposes. This paper examines dairy farmers' explainability requirements for technical recommendations and data privacy, along with the influence of socio-demographic factors. Based on a mixed-methods study involving 40 German dairy farmers, we identify five user personas through k-means clustering. Our findings reveal varying requirements, with some farmers preferring little detail while others seek full transparency across different aspects. Age, technology experience, and confidence in using digital systems were found to correlate with these explainability requirements. The resulting user personas offer practical guidance for requirements engineers aiming to tailor digital systems more effectively to the diverse requirements of farmers.
Authors:Qianou Ma, Megan Chai, Yike Tan, Jihun Choi, Jini Kim, Erik Harpstead, Geoff Kauffman, Tongshuang Wu
Abstract:
The wide adoption of Generative AI (GenAI) in everyday life highlights the need for greater literacy around its evolving capabilities, biases, and limitations. While many AI literacy efforts focus on children through game-based learning, few interventions support adults in developing a nuanced, reflective understanding of GenAI via playful exploration. To address the gap, we introduce ImaginAItion, a multiplayer party game inspired by Drawful and grounded in the reflective play framework to surface model defaults, biases, and human-AI perception gaps through prompting and discussion. From ten sessions (n=30), we show how gameplay helped adults recognize systematic biases in GenAI, reflect on humans and AI interpretation differences, and adapt their prompting strategies. We also found that group dynamics and composition, such as expertise and diversity, amplified or muted reflection. Our work provides a starting point to scale critical GenAI literacy through playful, social interventions resilient to rapidly evolving technologies.
Authors:Anna Deichler, Siyang Wang, Simon Alexanderson, Jonas Beskow
Abstract:
Pointing is a key mode of interaction with robots, yet most prior work has focused on recognition rather than generation. We present a motion capture dataset of human pointing gestures covering diverse styles, handedness, and spatial targets. Using reinforcement learning with motion imitation, we train policies that reproduce human-like pointing while maximizing precision. Results show our approach enables context-aware pointing behaviors in simulation, balancing task performance with natural dynamics.
Authors:Axel Wiebe Werner, Jonas Beskow, Anna Deichler
Abstract:
Gestures are central to human communication, enriching interactions through non-verbal expression. Virtual avatars increasingly use AI-generated gestures to enhance life-likeness, yet evaluations have largely been confined to 2D. Virtual Reality (VR) provides an immersive alternative that may affect how gestures are perceived. This paper presents a comparative evaluation of computer-generated gestures in VR and 2D, examining three models from the 2023 GENEA Challenge. Results show that gestures viewed in VR were rated slightly higher on average, with the strongest effect observed for motion-capture "true movement." While model rankings remained consistent across settings, VR influenced participants' overall perception and offered unique benefits over traditional 2D evaluation.
Authors:Anna Deichler, Siyang Wang, Simon Alexanderson, Jonas Beskow
Abstract:
One of the main goals of robotics and intelligent agent research is to enable natural communication with humans in physically situated settings. While recent work has focused on verbal modes such as language and speech, non-verbal communication is crucial for flexible interaction. We present a framework for generating pointing gestures in embodied agents by combining imitation and reinforcement learning. Using a small motion capture dataset, our method learns a motor control policy that produces physically valid, naturalistic gestures with high referential accuracy. We evaluate the approach against supervised learning and retrieval baselines in both objective metrics and a virtual reality referential game with human users. Results show that our system achieves higher naturalness and accuracy than state-of-the-art supervised models, highlighting the promise of imitation-RL for communicative gesture generation and its potential application to robots.
Authors:Yaqing Yang, Vikram Mohanty, Nikolas Martelaro, Aniket Kittur, Yan-Ying Chen, Matthew K. Hong
Abstract:
Divergent thinking in the ideation stage of creative problem-solving demands that individuals explore a broad design space. Yet this exploration rarely follows a neat, linear sequence; problem-solvers constantly shift among searching, creating, and evaluating ideas. Existing interfaces either impose rigid, step-by-step workflows or permit unguided free-form exploration. To strike a balance between flexibility and guidance for augmenting people's efficiency and creativity, we introduce a human-AI collaborative workflow that supports a fluid ideation process. The system surfaces three opt-in aids: (1) high-level schemas to uncover alternative ideas, (2) risk analysis with mitigation suggestions, and (3) steering system-generated suggestions. Users can invoke these supports at any moment, allowing seamless back-and-forth movement among design actions to maintain creative momentum.
Authors:Tae Soo Kim, Heechan Lee, Yoonjoo Lee, Joseph Seering, Juho Kim
Abstract:
Practitioners increasingly rely on Large Language Models (LLMs) to evaluate generative AI outputs through "LLM-as-a-Judge" approaches. However, these methods produce holistic scores that obscure which specific elements influenced the assessments. We propose functional fragmentation, a method that dissects each output into key fragments and interprets the rhetoric functions that each fragment serves relative to evaluation criteria -- surfacing the elements of interest and revealing how they fulfill or hinder user goals. We instantiate this approach in Evalet, an interactive system that visualizes fragment-level functions across many outputs to support inspection, rating, and comparison of evaluations. A user study (N=10) found that, while practitioners struggled to validate holistic scores, our approach helped them identify 48% more evaluation misalignments. This helped them calibrate trust in LLM evaluations and rely on them to find more actionable issues in model outputs. Our work shifts LLM evaluation from quantitative scores toward qualitative, fine-grained analysis of model behavior.
Authors:Sankalp Tattwadarshi Swain, Anshika Krishnatray, Dhruv Kumar, Jagat Sesh Challa
Abstract:
Existing evaluation studies on linguistic competence of large language models (LLM agents) have focused primarily on vocabulary learning, morphological rule induction, syntactic generalization, pragmatic inference, and cross-linguistic transfer. However, none assess whether LLM agents can acquire a language through pattern recognition and interactive feedback, a central feature of human language acquisition. We propose a novel experimental framework in which an LLM agent is evaluated on its ability to acquire and use a newly constructed language (Tinkatongue) in conversation with a bot that understands only Tinkatongue. Our findings show that LLM agents fail to establish a conversation within 100 responses, yet they adopt distinct strategies that mirror human approaches to language learning. The results suggest a new direction for evaluation benchmarks and open pathways to model designs that learn more effectively from interactive feedback.
Authors:Yunnong Chen, Chengwei Shi, Liuqing Chen
Abstract:
Large language models (LLMs) promise to accelerate UI design, yet current tools struggle with two fundamentals: externalizing designers' intent and controlling iterative change. We introduce SPEC, a structured, parameterized, hierarchical intermediate representation that exposes UI elements as controllable parameters. Building on SPEC, we present SpecifyUI, an interactive system that extracts SPEC from UI references via region segmentation and vision-language models, composes UIs across multiple sources, and supports targeted edits at global, regional, and component levels. A multi-agent generator renders SPEC into high-fidelity designs, closing the loop between intent expression and controllable generation. Quantitative experiments show SPEC-based generation more faithfully captures reference intent than prompt-based baselines. In a user study with 16 professional designers, SpecifyUI significantly outperformed Stitch on intent alignment, design quality, controllability, and overall experience in human-AI co-creation. Our results position SPEC as a specification-driven paradigm that shifts LLM-assisted design from one-shot prompting to iterative, collaborative workflows.
Authors:Eduardo Davalos, Yike Zhang, Shruti Jain, Namrata Srivastava, Trieu Truong, Nafees-ul Haque, Tristan Van, Jorge Salas, Sara McFadden, Sun-Joo Cho, Gautam Biswas, Amanda Goodwin
Abstract:
Eye-tracking offers rich insights into student cognition and engagement, but remains underutilized in classroom-facing educational technology due to challenges in data interpretation and accessibility. In this paper, we present the iterative design and evaluation of a gaze-based learning analytics dashboard for English Language Arts (ELA), developed through five studies involving teachers and students. Guided by user-centered design and data storytelling principles, we explored how gaze data can support reflection, formative assessment, and instructional decision-making. Our findings demonstrate that gaze analytics can be approachable and pedagogically valuable when supported by familiar visualizations, layered explanations, and narrative scaffolds. We further show how a conversational agent, powered by a large language model (LLM), can lower cognitive barriers to interpreting gaze data by enabling natural language interactions with multimodal learning analytics. We conclude with design implications for future EdTech systems that aim to integrate novel data modalities in classroom contexts.
Authors:Chi Sun, Xian Wang, Abhishek Kumar, Chengbin Cui, Lik-Hang Lee
Abstract:
Effective human-robot interaction (HRI) in multi-object teleoperation tasks faces significant challenges due to perceptual ambiguities in virtual reality (VR) environments and the limitations of single-modality intention recognition. This paper proposes a shared control framework that combines a virtual admittance (VA) model with a Multimodal-CNN-based Human Intention Perception Network (MMIPN) to enhance teleoperation performance and user experience. The VA model employs artificial potential fields to guide operators toward target objects by adjusting admittance force and optimizing motion trajectories. MMIPN processes multimodal inputs, including gaze movement, robot motions, and environmental context, to estimate human grasping intentions, helping to overcome depth perception challenges in VR. Our user study evaluated four conditions across two factors, and the results showed that MMIPN significantly improved grasp success rates, while the VA model enhanced movement efficiency by reducing path lengths. Gaze data emerged as the most crucial input modality. These findings demonstrate the effectiveness of combining multimodal cues with implicit guidance in VR-based teleoperation, providing a robust solution for multi-object grasping tasks and enabling more natural interactions across various applications in the future.
Authors:Xuanru Cheng, Xian Wang, Chi-lok Tai, Lik-Hang Lee
Abstract:
Promoting public health is challenging owing to its abstract nature, and individuals may be apprehensive about confronting it. Recently, there has been an increasing interest in using the metaverse and gamification as novel educational techniques to improve learning experiences related to the immune system. Thus, we present MetaRoundWorm, an immersive virtual reality (VR) escape room game designed to enhance the understanding of parasitic infections and host immune responses through interactive, gamified learning. The application simulates the lifecycle of Ascaris lumbricoides and corresponding immunological mechanisms across anatomically accurate environments within the human body. Integrating serious game mechanics with embodied learning principles, MetaRoundWorm offers players a task-driven experience combining exploration, puzzle-solving, and immune system simulation. To evaluate the educational efficacy and user engagement, we conducted a controlled study comparing MetaRoundWorm against a traditional approach, i.e., interactive slides. Results indicate that MetaRoundWorm significantly improves immediate learning outcomes, cognitive engagement, and emotional experience, while maintaining knowledge retention over time. Our findings suggest that immersive VR gamification holds promise as an effective pedagogical tool for communicating complex biomedical concepts and advancing digital health education.
Authors:Yi-Hao Peng, Dingzeyu Li, Jeffrey P. Bigham, Amy Pavel
Abstract:
User interface (UI) agents promise to make inaccessible or complex UIs easier to access for blind and low-vision (BLV) users. However, current UI agents typically perform tasks end-to-end without involving users in critical choices or making them aware of important contextual information, thus reducing user agency. For example, in our field study, a BLV participant asked to buy the cheapest available sparkling water, and the agent automatically chose one from several equally priced options, without mentioning alternative products with different flavors or better ratings. To address this problem, we introduce Morae, a UI agent that automatically identifies decision points during task execution and pauses so that users can make choices. Morae uses large multimodal models to interpret user queries alongside UI code and screenshots, and prompt users for clarification when there is a choice to be made. In a study over real-world web tasks with BLV participants, Morae helped users complete more tasks and select options that better matched their preferences, as compared to baseline agents, including OpenAI Operator. More broadly, this work exemplifies a mixed-initiative approach in which users benefit from the automation of UI agents while being able to express their preferences.
Authors:Jingyao Zheng, Haodi Weng, Xian Wang, Chengbin Cui, Sven Mayer, Chi-lok Tai, Lik-Hang Lee
Abstract:
Mixed Reality (MR) is increasingly integrated into daily life, providing enhanced capabilities across various domains. However, users face growing notification streams that disrupt their immersive experience. We present PersoNo, a personalised notification urgency classifier for MR that intelligently classifies notifications based on individual user preferences. Through a user study (N=18), we created the first MR notification dataset containing both self-labelled and interaction-based data across activities with varying cognitive demands. Our thematic analysis revealed that, unlike in mobiles, the activity context is equally important as the content and the sender in determining notification urgency in MR. Leveraging these insights, we developed PersoNo using large language models that analyse users replying behaviour patterns. Our multi-agent approach achieved 81.5% accuracy and significantly reduced false negative rates (0.381) compared to baseline models. PersoNo has the potential not only to reduce unnecessary interruptions but also to offer users understanding and control of the system, adhering to Human-Centered Artificial Intelligence design principles.
Authors:Rashid Mushkani, Hugo Berard, Shin Koseki
Abstract:
Community consultations are integral to urban planning processes intended to incorporate diverse stakeholder perspectives. However, limited resources, visual and spoken language barriers, and uneven power dynamics frequently constrain inclusive decision-making. This paper examines how generative text-to-image methods, specifically Stable Diffusion XL integrated into a custom platform (WeDesign), may support equitable consultations. A half-day workshop in Montreal involved five focus groups, each consisting of architects, urban designers, AI specialists, and residents from varied demographic groups. Additional data was gathered through semi-structured interviews with six urban planning professionals. Participants indicated that immediate visual outputs facilitated creativity and dialogue, yet noted issues in visualizing specific needs of marginalized groups, such as participants with reduced mobility, accurately depicting local architectural elements, and accommodating bilingual prompts. Participants recommended the development of an open-source platform incorporating in-painting tools, multilingual support, image voting functionalities, and preference indicators. The results indicate that generative AI can broaden participation and enable iterative interactions but requires structured facilitation approaches. The findings contribute to discussions on generative AI's role and limitations in participatory urban design.
Authors:NVIDIA, :, Chaeyeon Chung, Ilya Fedorov, Michael Huang, Aleksey Karmanov, Dmitry Korobchenko, Roger Ribera, Yeongho Seol
Abstract:
Audio-driven facial animation presents an effective solution for animating digital avatars. In this paper, we detail the technical aspects of NVIDIA Audio2Face-3D, including data acquisition, network architecture, retargeting methodology, evaluation metrics, and use cases. Audio2Face-3D system enables real-time interaction between human users and interactive avatars, facilitating facial animation authoring for game characters. To assist digital avatar creators and game developers in generating realistic facial animations, we have open-sourced Audio2Face-3D networks, SDK, training framework, and example dataset.
Authors:Sebastian Lubos, Alexander Felfernig, Gerhard Leitner, Julian Schwazer
Abstract:
Usability describes a set of essential quality attributes of user interfaces (UI) that influence human-computer interaction. Common evaluation methods, such as usability testing and inspection, are effective but resource-intensive and require expert involvement. This makes them less accessible for smaller organizations. Recent advances in multimodal LLMs offer promising opportunities to automate usability evaluation processes partly by analyzing textual, visual, and structural aspects of software interfaces. To investigate this possibility, we formulate usability evaluation as a recommendation task, where multimodal LLMs rank usability issues by severity. We conducted an initial proof-of-concept study to compare LLM-generated usability improvement recommendations with usability expert assessments. Our findings indicate the potential of LLMs to enable faster and more cost-effective usability evaluation, which makes it a practical alternative in contexts with limited expert resources.
Authors:Mohit Sharma, Emma Nilsson, Martin Falk, Talha Bin Masood, Lee Jollans, Anders Persson, Tino Ebbers, Ingrid Hotz
Abstract:
Photon-Counting Computed Tomography (PCCT) is a novel imaging modality that simultaneously acquires volumetric data at multiple X-ray energy levels, generating separate volumes that capture energy-dependent attenuation properties. Attenuation refers to the reduction in X-ray intensity as it passes through different tissues or materials. This spectral information enhances tissue and material differentiation, enabling more accurate diagnosis and analysis. However, the resulting multivolume datasets are often complex and redundant, making visualization and interpretation challenging. To address these challenges, we propose a method for fusing spectral PCCT data into a single representative volume that enables direct volume rendering and segmentation by leveraging both shared and complementary information across different channels. Our approach starts by computing 2D histograms between pairs of volumes to identify those that exhibit prominent structural features. These histograms reveal relationships and variations that may be difficult to discern from individual volumes alone. Next, we construct an extremum graph from the 2D histogram of two minimally correlated yet complementary volumes-selected to capture both shared and distinct features-thereby maximizing the information content. The graph captures the topological distribution of histogram extrema. By extracting prominent structure within this graph and projecting each grid point in histogram space onto it, we reduce the dimensionality to one, producing a unified volume. This representative volume retains key structural and material characteristics from the original spectral data while significantly reducing the analysis scope from multiple volumes to one. The result is a topology-aware, information-rich fusion of multi-energy CT datasets that facilitates more effective visualization and segmentation.
Authors:Faruk Alpay, Taylan Alpay
Abstract:
We formalize three design axioms for sustained adoption of agent-centric AI systems executing multi-step tasks: (A1) Reliability > Novelty; (A2) Embed > Destination; (A3) Agency > Chat. We model adoption as a sum of a decaying novelty term and a growing utility term and derive the phase conditions for troughs/overshoots with full proofs. We introduce: (i) an identifiability/confounding analysis for $(α,β,N_0,U_{\max})$ with delta-method gradients; (ii) a non-monotone comparator (logistic-with-transient-bump) evaluated on the same series to provide additional model comparison; (iii) ablations over hazard families $h(\cdot)$ mapping $ÎV \to β$; (iv) a multi-series benchmark (varying trough depth, noise, AR structure) reporting coverage (type-I error, power); (v) calibration of friction proxies against time-motion/survey ground truth with standard errors; (vi) residual analyses (autocorrelation and heteroskedasticity) for each fitted curve; (vii) preregistered windowing choices for pre/post estimation; (viii) Fisher information & CRLB for $(α,β)$ under common error models; (ix) microfoundations linking $\mathcal{T}$ to $(N_0,U_{\max})$; (x) explicit comparison to bi-logistic, double-exponential, and mixture models; and (xi) threshold sensitivity to $C_f$ heterogeneity. Figures and tables are reflowed for readability, and the bibliography restores and extends non-logistic/Bass adoption references (Gompertz, Richards, Fisher-Pry, Mansfield, Griliches, Geroski, Peres). All code and logs necessary to reproduce the synthetic analyses are embedded as LaTeX listings.
Authors:Cesar Alan Contreras, Manolis Chiou, Alireza Rastegarpanah, Michal Szulik, Rustam Stolkin
Abstract:
Human-robot collaboration requires robots to quickly infer user intent, provide transparent reasoning, and assist users in achieving their goals. Our recent work introduced GUIDER, our framework for inferring navigation and manipulation intents. We propose augmenting GUIDER with a vision-language model (VLM) and a text-only language model (LLM) to form a semantic prior that filters objects and locations based on the mission prompt. A vision pipeline (YOLO for object detection and the Segment Anything Model for instance segmentation) feeds candidate object crops into the VLM, which scores their relevance given an operator prompt; in addition, the list of detected object labels is ranked by a text-only LLM. These scores weight the existing navigation and manipulation layers of GUIDER, selecting context-relevant targets while suppressing unrelated objects. Once the combined belief exceeds a threshold, autonomy changes occur, enabling the robot to navigate to the desired area and retrieve the desired object, while adapting to any changes in the operator's intent. Future work will evaluate the system on Isaac Sim using a Franka Emika arm on a Ridgeback base, with a focus on real-time assistance.
Authors:Kaixun Yang, Yizhou Fan, Luzhen Tang, Mladen RakoviÄ, Xinyu Li, Dragan GaÅ¡eviÄ, Guanliang Chen
Abstract:
The integration of Generative AI (GenAI) into education is reshaping how students learn, making self-regulated learning (SRL) - the ability to plan, monitor, and adapt one's learning - more important than ever. To support learners in these new contexts, it is essential to understand how SRL unfolds during interaction with GenAI tools. Learning analytics offers powerful techniques for analyzing digital trace data to infer SRL behaviors. However, existing approaches often assume SRL processes are linear, segmented, and non-overlapping-assumptions that overlook the dynamic, recursive, and non-linear nature of real-world learning. We address this by conceptualizing SRL as a layered system: observable learning patterns reflect hidden tactics (short, purposeful action states), which combine into broader SRL strategies. Using Hidden Markov Models (HMMs), we analyzed trace data from higher education students engaged in GenAI-assisted academic writing. We identified three distinct groups of learners, each characterized by different SRL strategies. These groups showed significant differences in performance, indicating that students' use of different SRL strategies in GenAI-assisted writing led to varying task outcomes. Our findings advance the methodological toolkit for modeling SRL and inform the design of adaptive learning technologies that more effectively support learners in GenAI-enhanced educational environments.
Authors:Yancheng Jiang, Yan Jiang, Ruochen Zhou, Yi-Chao Chen, Xiaoyu Ji, Wenyuan Xu
Abstract:
Virtual Reality (VR) techniques, serving as the bridge between the real and virtual worlds, have boomed and are widely used in manufacturing, remote healthcare, gaming, etc. Specifically, VR systems offer users immersive experiences that include both perceptions and actions. Various studies have demonstrated that attackers can manipulate VR software to influence users' interactions, including perception and actions. However, such attacks typically require strong access and specialized expertise. In this paper, we are the first to present a systematic analysis of physical attacks against VR systems and introduce False Reality, a new attack threat to VR devices without requiring access to or modification of their software. False Reality disturbs VR system services by tampering with sensor measurements, and further spoofing users' perception even inducing harmful actions, e.g., inducing dizziness or causing users to crash into obstacles, by exploiting perceptual and psychological effects. We formalize these threats through an attack pathway framework and validate three representative pathways via physical experiments and user studies on five commercial VR devices. Finally, we further propose a defense prototype to mitigate such threats. Our findings shall provide valuable insights for enhancing the security and resilience of future VR systems.
Authors:Mohamed Rayan Barhdadi, Mehmet Tuncel, Erchin Serpedin, Hasan Kurban
Abstract:
Current AI approaches to refugee integration optimize narrow objectives such as employment and fail to capture the cultural, emotional, and ethical dimensions critical for long-term success. We introduce EMPATHIA (Enriched Multimodal Pathways for Agentic Thinking in Humanitarian Immigrant Assistance), a multi-agent framework addressing the central Creative AI question: how do we preserve human dignity when machines participate in life-altering decisions? Grounded in Kegan's Constructive Developmental Theory, EMPATHIA decomposes integration into three modules: SEED (Socio-cultural Entry and Embedding Decision) for initial placement, RISE (Rapid Integration and Self-sufficiency Engine) for early independence, and THRIVE (Transcultural Harmony and Resilience through Integrated Values and Engagement) for sustained outcomes. SEED employs a selector-validator architecture with three specialized agents - emotional, cultural, and ethical - that deliberate transparently to produce interpretable recommendations. Experiments on the UN Kakuma dataset (15,026 individuals, 7,960 eligible adults 15+ per ILO/UNHCR standards) and implementation on 6,359 working-age refugees (15+) with 150+ socioeconomic variables achieved 87.4% validation convergence and explainable assessments across five host countries. EMPATHIA's weighted integration of cultural, emotional, and ethical factors balances competing value systems while supporting practitioner-AI collaboration. By augmenting rather than replacing human expertise, EMPATHIA provides a generalizable framework for AI-driven allocation tasks where multiple values must be reconciled.
Authors:Hamza El Alaoui, Atieh Taheri, Yi-Hao Peng, Jeffrey P. Bigham
Abstract:
People frequently use speech-to-text systems to compose short texts with voice. However, current voice-based interfaces struggle to support composing more detailed, contextually complex texts, especially in scenarios where users are on the move and cannot visually track progress. Longer-form communication, such as composing structured emails or thoughtful responses, requires persistent context tracking, structured guidance, and adaptability to evolving user intentions--capabilities that conventional dictation tools and voice assistants do not support. We introduce StepWrite, a large language model-driven voice-based interaction system that augments human writing ability by enabling structured, hands-free and eyes-free composition of longer-form texts while on the move. StepWrite decomposes the writing process into manageable subtasks and sequentially guides users with contextually-aware non-visual audio prompts. StepWrite reduces cognitive load by offloading the context-tracking and adaptive planning tasks to the models. Unlike baseline methods like standard dictation features (e.g., Microsoft Word) and conversational voice assistants (e.g., ChatGPT Advanced Voice Mode), StepWrite dynamically adapts its prompts based on the evolving context and user intent, and provides coherent guidance without compromising user autonomy. An empirical evaluation with 25 participants engaging in mobile or stationary hands-occupied activities demonstrated that StepWrite significantly reduces cognitive load, improves usability and user satisfaction compared to baseline methods. Technical evaluations further confirmed StepWrite's capability in dynamic contextual prompt generation, accurate tone alignment, and effective fact checking. This work highlights the potential of structured, context-aware voice interactions in enhancing hands-free and eye-free communication in everyday multitasking scenarios.
Authors:Wenshuo Zhang, Leixian Shen, Shuchang Xu, Jindu Wang, Jian Zhao, Huamin Qu, Linping Yuan
Abstract:
Conversational LLMs have been widely adopted by domain users with limited programming experience to solve domain problems. However, these users often face misalignment between their intent and generated code, resulting in frustration and rounds of clarification. This work first investigates the cause of this misalignment, which dues to bidirectional ambiguity: both user intents and coding tasks are inherently nonlinear, yet must be expressed and interpreted through linear prompts and code sequences. To address this, we propose direct intent-task matching, a new human-LLM interaction paradigm that externalizes and enables direct manipulation of the LLM understanding, i.e., the coding tasks and their relationships inferred by the LLM prior to code generation. As a proof-of-concept, this paradigm is then implemented in NeuroSync, which employs a knowledge distillation pipeline to extract LLM understanding, user intents, and their mappings, and enhances the alignment by allowing users to intuitively inspect and edit them via visualizations. We evaluate the algorithmic components of NeuroSync via technical experiments, and assess its overall usability and effectiveness via a user study (N=12). The results show that it enhances intent-task alignment, lowers cognitive effort, and improves coding efficiency.
Authors:Akshay Paruchuri, Sinan Hersek, Lavisha Aggarwal, Qiao Yang, Xin Liu, Achin Kulshrestha, Andrea Colaco, Henry Fuchs, Ishan Chatterjee
Abstract:
All-day smart glasses are likely to emerge as platforms capable of continuous contextual sensing, uniquely positioning them for unprecedented assistance in our daily lives. Integrating the multi-modal AI agents required for human memory enhancement while performing continuous sensing, however, presents a major energy efficiency challenge for all-day usage. Achieving this balance requires intelligent, context-aware sensor management. Our approach, EgoTrigger, leverages audio cues from the microphone to selectively activate power-intensive cameras, enabling efficient sensing while preserving substantial utility for human memory enhancement. EgoTrigger uses a lightweight audio model (YAMNet) and a custom classification head to trigger image capture from hand-object interaction (HOI) audio cues, such as the sound of a drawer opening or a medication bottle being opened. In addition to evaluating on the QA-Ego4D dataset, we introduce and evaluate on the Human Memory Enhancement Question-Answer (HME-QA) dataset. Our dataset contains 340 human-annotated first-person QA pairs from full-length Ego4D videos that were curated to ensure that they contained audio, focusing on HOI moments critical for contextual understanding and memory. Our results show EgoTrigger can use 54% fewer frames on average, significantly saving energy in both power-hungry sensing components (e.g., cameras) and downstream operations (e.g., wireless transmission), while achieving comparable performance on datasets for an episodic memory task. We believe this context-aware triggering strategy represents a promising direction for enabling energy-efficient, functional smart glasses capable of all-day use -- supporting applications like helping users recall where they placed their keys or information about their routine activities (e.g., taking medications).
Authors:Tae Soo Kim, Yoonjoo Lee, Yoonah Park, Jiho Kim, Young-Ho Kim, Juho Kim
Abstract:
Personalization of Large Language Models (LLMs) often assumes users hold static preferences that reflect globally in all tasks. In reality, humans hold dynamic preferences that change depending on the context. As users interact with an LLM in various contexts, they naturally reveal their contextual preferences, which a model must infer and apply in future contexts to ensure alignment. To assess this, we introduce CUPID, a benchmark of 756 human-curated interaction session histories between users and LLM-based chat assistants. In each interaction session, the user provides a request in a specific context and expresses their preference through multi-turn feedback. Given a new user request and prior interaction sessions, our benchmark assesses whether LLMs can infer the preference relevant to this request and generate a response that satisfies this preference. With CUPID, we evaluated 10 open and proprietary LLMs, revealing that state-of-the-art LLMs struggle to infer preferences from multi-turn interactions and fail to discern what previous context is relevant to a new request -- under 50% precision and 65% recall. Our work highlights the need to advance LLM capabilities for more contextually personalized interactions and proposes CUPID as a resource to drive these improvements.
Authors:Mohsen Abbaspour Onari, Lucie Charlotte Magister, Yaoxin Wu, Amalia Lupi, Dario Creazzo, Mattia Tordin, Luigi Di Donatantonio, Emilio Quaia, Chao Zhang, Isel Grau, Marco S. Nobile, Yingqian Zhang, Pietro Liò
Abstract:
Distal myopathy represents a genetically heterogeneous group of skeletal muscle disorders with broad clinical manifestations, posing diagnostic challenges in radiology. To address this, we propose a novel multimodal attention-aware fusion architecture that combines features extracted from two distinct deep learning models, one capturing global contextual information and the other focusing on local details, representing complementary aspects of the input data. Uniquely, our approach integrates these features through an attention gate mechanism, enhancing both predictive performance and interpretability. Our method achieves a high classification accuracy on the BUSI benchmark and a proprietary distal myopathy dataset, while also generating clinically relevant saliency maps that support transparent decision-making in medical diagnosis. We rigorously evaluated interpretability through (1) functionally grounded metrics, coherence scoring against reference masks and incremental deletion analysis, and (2) application-grounded validation with seven expert radiologists. While our fusion strategy boosts predictive performance relative to single-stream and alternative fusion strategies, both quantitative and qualitative evaluations reveal persistent gaps in anatomical specificity and clinical usefulness of the interpretability. These findings highlight the need for richer, context-aware interpretability methods and human-in-the-loop feedback to meet clinicians' expectations in real-world diagnostic settings.
Authors:Tom Peterka, Tanwi Mallick, Orcun Yildiz, David Lenz, Cory Quammen, Berk Geveci
Abstract:
Large language models (LLMs) are rapidly increasing in capability, but they still struggle with highly specialized programming tasks such as scientific visualization. We present an LLM assistant, ChatVis, that aids the LLM to generate Python code for ParaView scientific visualization tasks, without the need for retraining or fine-tuning the LLM. ChatVis employs chain-of-thought prompt simplification, retrieval-augmented prompt generation using a vector database of documentation and code examples, and error checking with iterative prompt feedback to correct errors until a visualization is produced. An integral part of our approach is a benchmark suite of canonical visualization tasks, ParaView regression tests, and scientific use cases that includes comprehensive evaluation metrics. We evaluate our visualization assistant by comparing results with a variety of top-performing unassisted LLMs. We find that all the metrics are significantly improved with ChatVis.
Authors:Hussein Mozannar, Gagan Bansal, Cheng Tan, Adam Fourney, Victor Dibia, Jingya Chen, Jack Gerrits, Tyler Payne, Matheus Kunzler Maldaner, Madeleine Grunde-McLaughlin, Eric Zhu, Griffin Bassman, Jacob Alber, Peter Chang, Ricky Loynd, Friederike Niedtner, Ece Kamar, Maya Murad, Rafah Hosn, Saleema Amershi
Abstract:
AI agents powered by large language models are increasingly capable of autonomously completing complex, multi-step tasks using external tools. Yet, they still fall short of human-level performance in most domains including computer use, software development, and research. Their growing autonomy and ability to interact with the outside world, also introduces safety and security risks including potentially misaligned actions and adversarial manipulation. We argue that human-in-the-loop agentic systems offer a promising path forward, combining human oversight and control with AI efficiency to unlock productivity from imperfect systems. We introduce Magentic-UI, an open-source web interface for developing and studying human-agent interaction. Built on a flexible multi-agent architecture, Magentic-UI supports web browsing, code execution, and file manipulation, and can be extended with diverse tools via Model Context Protocol (MCP). Moreover, Magentic-UI presents six interaction mechanisms for enabling effective, low-cost human involvement: co-planning, co-tasking, multi-tasking, action guards, and long-term memory. We evaluate Magentic-UI across four dimensions: autonomous task completion on agentic benchmarks, simulated user testing of its interaction capabilities, qualitative studies with real users, and targeted safety assessments. Our findings highlight Magentic-UI's potential to advance safe and efficient human-agent collaboration.
Authors:Lin Gao, Leixian Shen, Yuheng Zhao, Jiexiang Lan, Huamin Qu, Siming Chen
Abstract:
In data-driven storytelling contexts such as data journalism and data videos, data visualizations are often presented alongside real-world imagery to support narrative context. However, these visualizations and contextual images typically remain separated, limiting their combined narrative expressiveness and engagement. Achieving this is challenging due to the need for fine-grained alignment and creative ideation. To address this, we present SceneLoom, a Vision-Language Model (VLM)-powered system that facilitates the coordination of data visualization with real-world imagery based on narrative intents. Through a formative study, we investigated the design space of coordination relationships between data visualization and real-world scenes from the perspectives of visual alignment and semantic coherence. Guided by the derived design considerations, SceneLoom leverages VLMs to extract visual and semantic features from scene images and data visualization, and perform design mapping through a reasoning process that incorporates spatial organization, shape similarity, layout consistency, and semantic binding. The system generates a set of contextually expressive, image-driven design alternatives that achieve coherent alignments across visual, semantic, and data dimensions. Users can explore these alternatives, select preferred mappings, and further refine the design through interactive adjustments and animated transitions to support expressive data communication. A user study and an example gallery validate SceneLoom's effectiveness in inspiring creative design and facilitating design externalization.
Authors:Max Rädler, Mark Colley, Enrico Rukzio
Abstract:
People with vision impairments (VIPs) often rely on their remaining vision when interacting with user interfaces. Simulating visual impairments is an effective tool for designers, fostering awareness of the challenges faced by VIPs. While previous research has introduced various vision impairment simulators, none have yet been developed with the direct involvement of VIPs or thoroughly evaluated from their perspective. To address this gap, we developed VIP-Sim. This symptom-based vision simulator was created through a participatory design process tailored explicitly for this purpose, involving N=7 VIPs. 21 symptoms, like field loss or light sensitivity, can be overlaid on desktop design tools. Most participants felt VIP-Sim could replicate their symptoms. VIP-Sim was received positively, but concerns about exclusion in design and comprehensiveness of the simulation remain, mainly whether it represents the experiences of other VIPs.
Authors:Cesar Alan Contreras, Manolis Chiou, Alireza Rastegarpanah, Michal Szulik, Rustam Stolkin
Abstract:
Accurate inference of human intent enables human-robot collaboration without constraining human control or causing conflicts between humans and robots. We present GUIDER (Global User Intent Dual-phase Estimation for Robots), a probabilistic framework that enables a robot to estimate the intent of human operators. GUIDER maintains two coupled belief layers, one tracking navigation goals and the other manipulation goals. In the Navigation phase, a Synergy Map blends controller velocity with an occupancy grid to rank interaction areas. Upon arrival at a goal, an autonomous multi-view scan builds a local 3D cloud. The Manipulation phase combines U2Net saliency, FastSAM instance saliency, and three geometric grasp-feasibility tests, with an end-effector kinematics-aware update rule that evolves object probabilities in real-time. GUIDER can recognize areas and objects of intent without predefined goals. We evaluated GUIDER on 25 trials (five participants x five task variants) in Isaac Sim, and compared it with two baselines, one for navigation and one for manipulation. Across the 25 trials, GUIDER achieved a median stability of 93-100% during navigation, compared with 60-100% for the BOIR baseline, with an improvement of 39.5% in a redirection scenario (T5). During manipulation, stability reached 94-100% (versus 69-100% for Trajectron), with a 31.4% difference in a redirection task (T3). In geometry-constrained trials (manipulation), GUIDER recognized the object intent three times earlier than Trajectron (median remaining time to confident prediction 23.6 s vs 7.8 s). These results validate our dual-phase framework and show improvements in intent inference in both phases of mobile manipulation tasks.
Authors:Mike Kentros, Manos Kamarianakis, Michael Cole, Vitaliy Popov, Antonis Protopsaltis, George Papagiannakis
Abstract:
This paper introduces iREACT, a novel VR simulation addressing key limitations in traditional cardiac arrest (CA) training. Conventional methods struggle to replicate the dynamic nature of real CA events, hindering Crew Resource Management (CRM) skill development. iREACT provides a non-linear, collaborative environment where teams respond to changing patient states, mirroring real CA complexities. By capturing multi-modal data (user actions, cognitive load, visual gaze) and offering real-time and post-session feedback, iREACT enhances CRM assessment beyond traditional methods. A formative evaluation with medical experts underscores its usability and educational value, with potential applications in other high-stakes training scenarios to improve teamwork, communication, and decision-making.
Authors:Sohshi Yoshida, Ko Watanabe, Andreas Dengel, Shoya Ishimaru, Shingo Ata, Manato Fujimoto
Abstract:
Prolonged sitting is a health risk leading to metabolic and cardiovascular diseases. To combat this, various "nudging" strategies encourage stand-ups. Behavior change triggers use explicit prompts such as smartphone push notifications or light controls. However, comparisons of the effects of such interactions, discomfort, and user context have not yet been performed. The present study evaluated these methods in a mixed design experiment with 15 college students. Three intervention methods (none, push notifications, and light dimming) and three user task contexts (computer work, video calls, and reading) were tested. The frequency of standing up and comfort were assessed after each ten-minute session. Results showed that dimming resulted in slightly more breaks (1.4 \pm 1.55) than push notification (1.2 \pm 1.08), but caused discomfort for 66.7% of participants, compared to 20% for notification. The results were influenced by task context. Dimming was most effective during video calls and reading, while push notifications were more effective during computer work. These findings suggest adaptive nudging systems should tailor interventions based on context and individual preferences.
Authors:Longjie Guo, Chenjie Yuan, Mingyuan Zhong, Robert Wolfe, Ruican Zhong, Yue Xu, Bingbing Wen, Hua Shen, Lucy Lu Wang, Alexis Hiniker
Abstract:
As LLM-based computer-use agents (CUAs) begin to autonomously interact with real-world interfaces, understanding their vulnerability to manipulative interface designs becomes increasingly critical. We introduce SusBench, an online benchmark for evaluating the susceptibility of CUAs to UI dark patterns, designs that aim to manipulate or deceive users into taking unintentional actions. Drawing nine common dark pattern types from existing taxonomies, we developed a method for constructing believable dark patterns on real-world consumer websites through code injections, and designed 313 evaluation tasks across 55 websites. Our study with 29 participants showed that humans perceived our dark pattern injections to be highly realistic, with the vast majority of participants not noticing that these had been injected by the research team. We evaluated five state-of-the-art CUAs on the benchmark. We found that both human participants and agents are particularly susceptible to the dark patterns of Preselection, Trick Wording, and Hidden Information, while being resilient to other overt dark patterns. Our findings inform the development of more trustworthy CUAs, their use as potential human proxies in evaluating deceptive designs, and the regulation of an online environment increasingly navigated by autonomous agents.
Authors:Zixuan Feng, Sadia Afroz, Anita Sarma
Abstract:
Generative AI (GenAI) is rapidly reshaping software development workflows. While prior studies emphasize productivity gains, the adoption of GenAI also introduces new pressures that may harm developers' well-being. In this paper, we investigate the relationship between the adoption of GenAI and developers' burnout. We utilized the Job Demands--Resources (JD--R) model as the analytic lens in our empirical study. We employed a concurrent embedded mixed-methods research design, integrating quantitative and qualitative evidence. We first surveyed 442 developers across diverse organizations, roles, and levels of experience. We then employed Partial Least Squares--Structural Equation Modeling (PLS-SEM) and regression to model the relationships among job demands, job resources, and burnout, complemented by a qualitative analysis of open-ended responses to contextualize the quantitative findings. Our results show that GenAI adoption heightens burnout by increasing job demands, while job resources and positive perceptions of GenAI mitigate these effects, reframing adoption as an opportunity.
Authors:Bo Ma, Hang Li, ZeHua Hu, XiaoFan Gui, LuYao Liu, Simon Liu
Abstract:
Multi-agent systems powered by large language models have demonstrated remarkable capabilities across diverse domains, yet existing automated design approaches seek monolithic solutions that fail to adapt resource allocation based on query complexity and domain requirements. This paper introduces AutoMaAS, a self-evolving multi-agent architecture search framework that leverages neural architecture search principles to automatically discover optimal agent configurations through dynamic operator lifecycle management and automated machine learning techniques. Our approach incorporates four key innovations: (1) automatic operator generation, fusion, and elimination based on performance-cost analysis, (2) dynamic cost-aware optimization with real-time parameter adjustment, (3) online feedback integration for continuous architecture refinement, and (4) enhanced interpretability through decision tracing mechanisms. Extensive experiments across six benchmarks demonstrate that AutoMaAS achieves 1.0-7.1\% performance improvement while reducing inference costs by 3-5\% compared to state-of-the-art methods. The framework shows superior transferability across datasets and LLM backbones, establishing a new paradigm for automated multi-agent system design in the era of large language models.
Authors:Varun Kotian, Vishrut Jain, Andrea Michelle Rios Lazcano, Daan Marinus Pool, Riender Happee, Barys Shyrokau
Abstract:
Driving simulators are increasingly used in research and development. However, simulators often cause motion sickness due to downscaled motion and unscaled veridical visuals. In this paper, a motion cueing algorithm is proposed that reduces motion sickness as predicted by the subjective vertical conflict (SVC) model using model predictive control (MPC). Both sensory conflict and specific force errors are penalised in the cost function, allowing the algorithm to jointly optimise fidelity and comfort. Human-in-the-loop experiments were conducted to compare four simulator motion settings: two variations of our MPC-based algorithm, one focused on pure specific force tracking and the second compromising specific force tracking and motion sickness minimisation, as well as reference adaptive washout and no motion cases. The experiments were performed on a hexapod driving simulator with participants exposed to passive driving. Experimental motion sickness results closely matched the sickness model predictions. As predicted by the model, the no motion condition yielded the lowest sickness levels. However, it was rated lowest in terms of fidelity. The compromise solution reduced sickness by over 50% (average MISC level 3 to 1.5) compared to adaptive washout and the algorithm focusing on specific force tracking, without any significant reduction in fidelity rating. The proposed approach for developing MCA that takes into account both the simulator dynamics and time evolution of motion sickness offers a significant advancement in achieving an optimal control of motion sickness and specific force recreation in driving simulators, supporting broader simulator use.
Authors:Stephen James Krol, Maria Teresa Llano, Jon McCormack
Abstract:
This paper investigates the importance of personal ownership in musical AI design, examining how practising musicians can maintain creative control over the compositional process. Through a four-week ecological evaluation, we examined how a music variation tool, reliant on the skill of musicians, functioned within a composition setting. Our findings demonstrate that the dependence of the tool on the musician's ability, to provide a strong initial musical input and to turn moments into complete musical ideas, promoted ownership of both the process and artefact. Qualitative interviews further revealed the importance of this personal ownership, highlighting tensions between technological capability and artistic identity. These findings provide insight into how musical AI can support rather than replace human creativity, highlighting the importance of designing tools that preserve the humanness of musical expression.
Authors:David Li, Nels Numan, Xun Qian, Yanhe Chen, Zhongyi Zhou, Evgenii Alekseev, Geonsun Lee, Alex Cooper, Min Xia, Scott Chung, Jeremy Nelson, Xiuxiu Yuan, Jolica Dias, Tim Bettridge, Benjamin Hersh, Michelle Huynh, Konrad Piascik, Ricardo Cabello, David Kim, Ruofei Du
Abstract:
We are on the cusp where Artificial Intelligence (AI) and Extended Reality (XR) are converging to unlock new paradigms of interactive computing. However, a significant gap exists between the ecosystems of these two fields: while AI research and development is accelerated by mature frameworks like JAX and benchmarks like LMArena, prototyping novel AI-driven XR interactions remains a high-friction process, often requiring practitioners to manually integrate disparate, low-level systems for perception, rendering, and interaction. To bridge this gap, we present XR Blocks, a cross-platform framework designed to accelerate human-centered AI + XR innovation. XR Blocks strives to provide a modular architecture with plug-and-play components for core abstraction in AI + XR: user, world, peers; interface, context, and agents. Crucially, it is designed with the mission of "reducing frictions from idea to reality", thus accelerating rapid prototyping of AI + XR apps. Built upon accessible technologies (WebXR, three.js, TensorFlow, Gemini), our toolkit lowers the barrier to entry for XR creators. We demonstrate its utility through a set of open-source templates, samples, and advanced demos, empowering the community to quickly move from concept to interactive XR prototype. Site: https://xrblocks.github.io
Authors:Arkadiy Saakyan, Najoung Kim, Smaranda Muresan, Tuhin Chakrabarty
Abstract:
N-gram novelty is widely used to evaluate language models' ability to generate text outside of their training data. More recently, it has also been adopted as a metric for measuring textual creativity. However, theoretical work on creativity suggests that this approach may be inadequate, as it does not account for creativity's dual nature: novelty (how original the text is) and appropriateness (how sensical and pragmatic it is). We investigate the relationship between this notion of creativity and n-gram novelty through 7542 expert writer annotations (n=26) of novelty, pragmaticality, and sensicality via close reading of human and AI-generated text. We find that while n-gram novelty is positively associated with expert writer-judged creativity, ~91% of top-quartile expressions by n-gram novelty are not judged as creative, cautioning against relying on n-gram novelty alone. Furthermore, unlike human-written text, higher n-gram novelty in open-source LLMs correlates with lower pragmaticality. In an exploratory study with frontier close-source models, we additionally confirm that they are less likely to produce creative expressions than humans. Using our dataset, we test whether zero-shot, few-shot, and finetuned models are able to identify creative expressions (a positive aspect of writing) and non-pragmatic ones (a negative aspect). Overall, frontier LLMs exhibit performance much higher than random but leave room for improvement, especially struggling to identify non-pragmatic expressions. We further find that LLM-as-a-Judge novelty scores from the best-performing model were predictive of expert writer preferences.
Authors:Yiluo Wei, Yupeng He, Gareth Tyson
Abstract:
VTubers, digital personas represented by animated avatars, have gained massive popularity. Traditionally, VTubers are operated and voiced by human controllers known as Nakanohito. The reliance on Nakanohito, however, poses risks due to potential personal controversies and operational disruptions. The emergence of AI-driven VTubers offers a new model free from these human constraints. While AI-driven VTubers present benefits such as continuous operation and reduced scandal risk, they also raise questions about authenticity and audience engagement. Therefore, to gain deeper insights, we conduct a case study, investigating viewer perceptions of Neuro-sama, the most popular AI-driven VTuber with 845k followers on Twitch and 753k followers on YouTube. We analyze 108k Reddit posts and 136k YouTube comments, aiming to better understand viewer motivations, how AI constructs the virtual persona, and perceptions of the AI as Nakanohito. Our findings enhance the understanding of AI-driven VTubers and their impact on digital streaming culture.
Authors:Weiye Xu, Zhang Jiang, Siqi Zheng, Xiyuxing Zhang, Yankai Zhao, Changhao Zhang, Jian Liu, Weiqiang Wang, Yuntao Wang
Abstract:
With the rapid advancement of smart glasses, voice interaction has become widely deployed due to its naturalness and convenience. However, its practicality is often undermined by the vulnerability to spoofing attacks and interference from surrounding sounds, making seamless voice authentication crucial for smart glasses usage. To address this challenge, we propose AuthGlass, a voice authentication approach that leverages both air- and bone-conducted speech features to enhance accuracy and liveness detection. Aiming to gain comprehensive knowledge on speech-related acoustic and vibration features, we built a smart glasses prototype with redundant synchronized microphones: 14 air-conductive microphones and 2 bone-conductive units. In a study with 42 participants, we validated that combining sound-field and vibration features significantly improves authentication robustness and attack resistance. Furthermore, experiments demonstrated that AuthGlass maintains competitive accuracy even under various practical scenarios, highlighting its applicability and scalability for real-world deployment.
Authors:Yiren Liu, Viraj Shah, Sangho Suh, Pao Siangliulue, Tal August, Yun Huang
Abstract:
Recent advances in multi-agent systems (MAS) enable tools for information search and ideation by assigning personas to agents. However, how users can effectively control, steer, and critically evaluate collaboration among multiple domain-expert agents remains underexplored. We present Perspectra, an interactive MAS that visualizes and structures deliberation among LLM agents via a forum-style interface, supporting @-mention to invite targeted agents, threading for parallel exploration, with a real-time mind map for visualizing arguments and rationales. In a within-subjects study with 18 participants, we compared Perspectra to a group-chat baseline as they developed research proposals. Our findings show that Perspectra significantly increased the frequency and depth of critical-thinking behaviors, elicited more interdisciplinary replies, and led to more frequent proposal revisions than the group chat condition. We discuss implications for designing multi-agent tools that scaffold critical thinking by supporting user control over multi-agent adversarial discourse.
Authors:Sohyeon Hwang, Sophie Rollins, Thatiany Andrade Nunes, Yuhan Liu, Richmond Wong, Aaron Shaw, Andrés Monroy-Hernández
Abstract:
Decentralizing the governance of social computing systems to communities promises to empower them to make independent decisions, with nuance and in accordance with their values. Yet, communities do not govern in isolation. Many problems communities face are common, or move across their boundaries. We therefore propose designing for "inter-community governance:" mechanisms that support relationships and interactions between communities to coordinate on governance issues. Drawing from workshops with 24 individuals on decentralized, community-run social media, we present six challenges in designing for inter-community governance surfaced through ideas proposed in workshops. Together, these ideas come together as an ecosystem of resources, infrastructures, and tools that highlight three key principles for designing for inter-community governance: modularity, forkability, and polycentricity. We end with a discussion of how the ideas proposed in workshops might be implemented in future work aiming to support community governance in social computing systems broadly.
Authors:Francesco Argenziano, Elena Umili, Francesco Leotta, Daniele Nardi
Abstract:
Recent years have witnessed a growing interest in automating labor-intensive and complex activities, i.e., those consisting of multiple atomic tasks, by deploying robots in dynamic and unpredictable environments such as industrial and agricultural settings. A key characteristic of these contexts is that activities are not predefined: while they involve a limited set of possible tasks, their combinations may vary depending on the situation. Moreover, despite recent advances in robotics, the ability for humans to monitor the progress of high-level activities - in terms of past, present, and future actions - remains fundamental to ensure the correct execution of safety-critical processes. In this paper, we introduce a general architecture that integrates Large Language Models (LLMs) with automated planning, enabling humans to specify high-level activities (also referred to as processes) using natural language, and to monitor their execution by querying a robot. We also present an implementation of this architecture using state-of-the-art components and quantitatively evaluate the approach in a real-world precision agriculture scenario.
Authors:Maria Ibrahim, Alap Kshirsagar, Dorothea Koert, Jan Peters
Abstract:
Effective communication is essential for safety and efficiency in human-robot collaboration, particularly in shared workspaces. This paper investigates the impact of nonverbal communication on human-robot interaction (HRI) by integrating reactive light signals and emotional displays into a robotic system. We equipped a Franka Emika Panda robot with an LED strip on its end effector and an animated facial display on a tablet to convey movement intent through colour-coded signals and facial expressions. We conducted a human-robot collaboration experiment with 18 participants, evaluating three conditions: LED signals alone, LED signals with reactive emotional displays, and LED signals with pre-emptive emotional displays. We collected data through questionnaires and position tracking to assess anticipation of potential collisions, perceived clarity of communication, and task performance. The results indicate that while emotional displays increased the perceived interactivity of the robot, they did not significantly improve collision anticipation, communication clarity, or task efficiency compared to LED signals alone. These findings suggest that while emotional cues can enhance user engagement, their impact on task performance in shared workspaces is limited.
Authors:Shomik Jain, Charlotte Park, Matheus Mesquita Viana, Ashia Wilson, Dana Calacci
Abstract:
We investigate whether long-context interactions between users and LLMs lead to AI mirroring behaviors. We focus on two forms of mirroring: (1) sycophancy -- the tendency of models to be overly agreeable with users, and (2) perspective mimesis -- the extent to which models reflect a user's perspective. Using two weeks of interaction context collected from 38 users, we compare model responses with and without long-context for two tasks: political explanations and personal advice. Our results demonstrate how and when real-world interaction contexts can amplify AI mirroring behaviors. We find that sycophancy increases in long-context, irrespective of the interaction topics. Perspective mimesis increases only in contexts where models can accurately infer user perspectives.
Authors:Siqi Zhang, Xiyuxing Zhang, Duc Vu, Tao Qiang, Clara Palacios, Jiangyifei Zhu, Yuntao Wang, Mayank Goel, Justin Chan
Abstract:
We present LubDubDecoder, a system that enables fine-grained monitoring of micro-cardiac vibrations associated with the opening and closing of heart valves across a range of hearables. Our system transforms the built-in speaker, the only transducer common to all hearables, into an acoustic sensor that captures the coarse "lub-dub" heart sounds, leverages their shared temporal and spectral structure to reconstruct the subtle seismocardiography (SCG) and gyrocardiography (GCG) waveforms, and extract the timing of key micro-cardiac events. In an IRB-approved feasibility study with 18 users, our system achieves correlations of 0.88-0.95 compared to chest-mounted reference measurements in within-user and cross-user evaluations, and generalizes to unseen hearables using a zero-effort adaptation scheme with a correlation of 0.91. Our system is robust across remounting sessions and music playback.
Authors:Lief Esbenshade, Shawon Sarkar, Drew Nucci, Ann Edwards, Sarah Nielsen, Joshua M. Rosenberg, Alex Liu, Zewei, Tian, Min Sun, Zachary Zhang, Thomas Han, Yulia Lapicus, Kevin He
Abstract:
In this report, we share findings from a nationally representative survey of US public school math and science teachers, examining current generative AI (GenAI) use, perceptions, constraints, and institutional support. We show trends in math and science teacher adoption of GenAI, including frequency and purpose of use. We describe how teachers use GenAI with students and their beliefs about GenAI's impact on student learning. We share teachers' reporting on the school and district support they are receiving for GenAI learning and implementation, and the support they would like schools and districts to provide, and close with implications for policy, practice, and research. Given the rapid pace of GenAI development and growing pressure on schools to integrate emerging technologies, these findings offer timely insights into how frontline educators are navigating this shift in practice.
Authors:Geonsun Lee, Min Xia, Nels Numan, Xun Qian, David Li, Yanhe Chen, Achin Kulshrestha, Ishan Chatterjee, Yinda Zhang, Dinesh Manocha, David Kim, Ruofei Du
Abstract:
Proactive AR agents promise context-aware assistance, but their interactions often rely on explicit voice prompts or responses, which can be disruptive or socially awkward. We introduce Sensible Agent, a framework designed for unobtrusive interaction with these proactive agents. Sensible Agent dynamically adapts both "what" assistance to offer and, crucially, "how" to deliver it, based on real-time multimodal context sensing. Informed by an expert workshop (n=12) and a data annotation study (n=40), the framework leverages egocentric cameras, multimodal sensing, and Large Multimodal Models (LMMs) to infer context and suggest appropriate actions delivered via minimally intrusive interaction modes. We demonstrate our prototype on an XR headset through a user study (n=10) in both AR and VR scenarios. Results indicate that Sensible Agent significantly reduces perceived interaction effort compared to voice-prompted baseline, while maintaining high usability and achieving higher preference.
Authors:Lorenzo Neil, Deepthi Mungara, Laurie Williams, Yasemin Acar, Bradley Reaves
Abstract:
Software developers face risks of leaking their software secrets, such as API keys or passwords, which can result in significant harm. Secret management tools (SMTs), such as HashiCorp Vault Secrets or Infisical, are highly recommended by industry, academia, and security guidelines to manage secrets securely. SMTs are designed to help developers secure their secrets in a central location, yet secrets leaks are still commonplace, and developers report difficulty in learning how to setup and use SMTs. While SMTs typically come with publicly available help resources (e.g., tool documentation and interfaces), it is unclear if these actually help developers learn to effectively use SMTs. Without usable help resources that onboards developers, quick adoption and effective use of SMTs may be unrealistic. In a qualitative two-step study, we observed 21 new users in person while they used SMTs to perform two secret management tasks: secret storage and access, then secret injection. We interviewed participants after each task to identify their challenges and experiences using SMTs, with the assistance of help resources. While our study sample is narrow, it serves as a reasonable proxy for new developers who are likely to adopt SMTs early in their careers. We found that even in a laboratory setting where new users found tool functionality, interface flexibility helpful, they still experienced increased difficulty to effectively use SMTs to securely remediate a hard-coded secret when they felt tool documentation was insufficient and it motivated participants to deviate from official tool documentation to access secondary sources or attempt workaround methods. Specific challenges reported by participants were tool documentation content quality, navigation difficulties with both tool documentation and web interfaces for finding helpful content, and supportive tool features.
Authors:Yaman Yu, Yiren Liu, Jacky Zhang, Yun Huang, Yang Wang
Abstract:
Large Language Models (LLMs) are increasingly used by teenagers and young adults in everyday life, ranging from emotional support and creative expression to educational assistance. However, their unique vulnerabilities and risk profiles remain under-examined in current safety benchmarks and moderation systems, leaving this population disproportionately exposed to harm. In this work, we present Youth AI Risk (YAIR), the first benchmark dataset designed to evaluate and improve the safety of youth LLM interactions. YAIR consists of 12,449 annotated conversation snippets spanning 78 fine grained risk types, grounded in a taxonomy of youth specific harms such as grooming, boundary violation, identity confusion, and emotional overreliance. We systematically evaluate widely adopted moderation models on YAIR and find that existing approaches substantially underperform in detecting youth centered risks, often missing contextually subtle yet developmentally harmful interactions. To address these gaps, we introduce YouthSafe, a real-time risk detection model optimized for youth GenAI contexts. YouthSafe significantly outperforms prior systems across multiple metrics on risk detection and classification, offering a concrete step toward safer and more developmentally appropriate AI interactions for young users.
Authors:Xiaozhou Ye, Kevin I-Kai Wang
Abstract:
Human Activity Recognition (HAR) using wearable sensors is crucial for healthcare, fitness tracking, and smart environments, yet cross-user variability -- stemming from diverse motion patterns, sensor placements, and physiological traits -- hampers generalization in real-world settings. Conventional supervised learning methods often overfit to user-specific patterns, leading to poor performance on unseen users. Existing domain generalization approaches, while promising, frequently overlook temporal dependencies or depend on impractical domain-specific labels. We propose Temporal-Preserving Reinforcement Learning Domain Generalization (TPRL-DG), a novel framework that redefines feature extraction as a sequential decision-making process driven by reinforcement learning. TPRL-DG leverages a Transformer-based autoregressive generator to produce temporal tokens that capture user-invariant activity dynamics, optimized via a multi-objective reward function balancing class discrimination and cross-user invariance. Key innovations include: (1) an RL-driven approach for domain generalization, (2) autoregressive tokenization to preserve temporal coherence, and (3) a label-free reward design eliminating the need for target user annotations. Evaluations on the DSADS and PAMAP2 datasets show that TPRL-DG surpasses state-of-the-art methods in cross-user generalization, achieving superior accuracy without per-user calibration. By learning robust, user-invariant temporal patterns, TPRL-DG enables scalable HAR systems, facilitating advancements in personalized healthcare, adaptive fitness tracking, and context-aware environments.
Authors:Yiluo Wei, Jiahui He, Gareth Tyson
Abstract:
Recently, a distinct form of online antisocial behavior, known as "fanchuan", has emerged across online platforms, particularly in livestreaming chats. Fanchuan is an indirect attack on a specific entity, such as a celebrity, video game, or brand. It entails two main actions: (i) individuals first feign support for the entity, and exhibit this allegiance widely; (ii) they then engage in offensive or irritating behavior, attempting to undermine the entity by association. This deceptive conduct is designed to tarnish the reputation of the target and/or its fan community. Fanchuan is a novel, covert and indirect form of social attack, occurring outside the targeted community (often in a similar or broader community), with strategic long-term objectives. This distinguishes fanchuan from other types of antisocial behavior and presents significant new challenges in moderation. We argue it is crucial to understand and combat this new malicious behavior. Therefore, we conduct the first empirical study on fanchuan behavior in livestreaming chats, focusing on Bilibili, a leading livestreaming platform in China. Our dataset covers 2.7 million livestreaming sessions on Bilibili, featuring 3.6 billion chat messages. We identify 130k instances of fanchuan behavior across 37.4k livestreaming sessions. Through various types of analysis, our research offers valuable insights into fanchuan behavior and its perpetrators.
Authors:Arran Zeyu Wang, Ghulam Jilani Quadri, Mengyuan Zhu, Chin Tseng, Danielle Albers Szafir
Abstract:
Understanding how people perceive visualizations is crucial for designing effective visual data representations; however, many heuristic design guidelines are derived from specific tasks or visualization types, without considering the constraints or conditions under which those guidelines hold. In this work, we aimed to assess existing design heuristics for categorical visualization using well-established psychological knowledge. Specifically, we examine the impact of the subitizing phenomenon in cognitive psychology -- people's ability to automatically recognize a small set of objects instantly without counting -- in data visualizations. We conducted three experiments with multi-class scatterplots -- between 2 and 15 classes with varying design choices -- across three different tasks -- class estimation, correlation comparison, and clustering judgments -- to understand how performance changes as the number of classes (and therefore set size) increases. Our results indicate if the category number is smaller than six, people tend to perform well at all tasks, providing empirical evidence of subitizing in visualization. When category numbers increased, performance fell, with the magnitude of the performance change depending on task and encoding. Our study bridges the gap between heuristic guidelines and empirical evidence by applying well-established psychological theories, suggesting future opportunities for using psychological theories and constructs to characterize visualization perception.
Authors:Natalia Kucirkova, Alexis Hiniker, Megumi Ishikawa, Sho Tsuji, Aayushi Dangol, Robert Wolfe
Abstract:
This paper outlines the challenges and opportunities of research on conversational agents with children and young people across four countries, exploring the ways AI technologies can support children's well-being across social and cultural contexts.
Authors:Steffen Hauck, Diar Abdlkarim, John Dudley, Per Ola Kristensson, Eyal Ofek, Jens Grubert
Abstract:
Human-Robot-Collaboration can enhance workflows by leveraging the mutual strengths of human operators and robots. Planning and understanding robot movements remain major challenges in this domain. This problem is prevalent in dynamic environments that might need constant robot motion path adaptation. In this paper, we investigate whether a minimalistic encoding of the reachability of a point near an object of interest, which we call ReachVox, can aid the collaboration between a remote operator and a robotic arm in VR. Through a user study (n=20), we indicate the strength of the visualization relative to a point-based reachability check-up.
Authors:Robert Wolfe, Aayushi Dangol, JaeWon Kim, Alexis Hiniker
Abstract:
We introduce Needs-Conscious Design, a human-centered framework for AI-mediated communication that builds on the principles of Nonviolent Communication (NVC). We conducted an interview study with N=14 certified NVC trainers and a diary study and co-design with N=13 lay users of online communication technologies to understand how NVC might inform design that centers human relationships. We define three pillars of Needs-Conscious Design: Intentionality, Presence, and Receptiveness to Needs. Drawing on participant co-designs, we provide design concepts and illustrative examples for each of these pillars. We further describe a problematic emergent property of AI-mediated communication identified by participants, which we call Empathy Fog, and which is characterized by uncertainty over how much empathy, attention, and effort a user has actually invested via an AI-facilitated online interaction. Finally, because even well-intentioned designs may alter user behavior and process emotional data, we provide guiding questions for consentful Needs-Conscious Design, applying an affirmative consent framework used in social media contexts. Needs-Conscious Design offers a foundation for leveraging AI to facilitate human connection, rather than replacing or obscuring it.
Authors:Yupei Li, Qiyang Sun, Michelle Schlicher, Yee Wen Lim, Björn W. Schuller
Abstract:
Affective Computing (AC) has enabled Artificial Intelligence (AI) systems to recognise, interpret, and respond to human emotions - a capability also known as Artificial Emotional Intelligence (AEI). It is increasingly seen as an important component of Artificial General Intelligence (AGI). We discuss whether in order to peruse this goal, AI benefits from moving beyond emotion recognition and synthesis to develop internal emotion-like states, which we term as Artificial Emotion (AE). This shift potentially allows AI to benefit from the paradigm of `inner emotions' in ways we - as humans - do. Although recent research shows early signs that AI systems may exhibit AE-like behaviours, a clear framework for how emotions can be realised in AI remains underexplored. In this paper, we discuss potential advantages of AE in AI, review current manifestations of AE in machine learning systems, examine emotion-modulated architectures, and summarise mechanisms for modelling and integrating AE into future AI. We also explore the ethical implications and safety risks associated with `emotional' AGI, while concluding with our opinion on how AE could be beneficial in the future.
Authors:Francisco Bolaños, Angelo Salatino, Francesco Osborne, Enrico Motta
Abstract:
Previous work has demonstrated that AI methods for analysing scientific literature benefit significantly from annotating sentences in papers according to their rhetorical roles, such as research gaps, results, limitations, extensions of existing methodologies, and others. Such representations also have the potential to support the development of a new generation of systems capable of producing high-quality literature reviews. However, achieving this goal requires the definition of a relevant annotation schema and effective strategies for large-scale annotation of the literature. This paper addresses these challenges by 1) introducing a novel annotation schema specifically designed to support literature review generation and 2) conducting a comprehensive evaluation of a wide range of state-of-the-art large language models (LLMs) in classifying rhetorical roles according to this schema. To this end, we also present Sci-Sentence, a novel multidisciplinary benchmark comprising 700 sentences manually annotated by domain experts and 2,240 sentences automatically labelled using LLMs. We evaluate 37 LLMs on this benchmark, spanning diverse model families and sizes, using both zero-shot learning and fine-tuning approaches. The experiments yield several novel insights that advance the state of the art in this challenging domain. First, the current generation of LLMs performs remarkably well on this task when fine-tuned on high-quality data, achieving performance levels above 96\% F1. Second, while large proprietary models like GPT-4o achieve the best results, some lightweight open-source alternatives also demonstrate excellent performance. Finally, enriching the training data with semi-synthetic examples generated by LLMs proves beneficial, enabling small encoders to achieve robust results and significantly enhancing the performance of several open decoder models.
Authors:Cuno Sankey-Olsen, Rasmus Hvass Olesen, Tobias Oliver Eberhard, Andreas Triantafyllopoulos, Björn Schuller, Ilhan Aslan
Abstract:
Chronic Obstructive Pulmonary Disease (COPD) is a serious and debilitating disease affecting millions around the world. Its early detection using non-invasive means could enable preventive interventions that improve quality of life and patient outcomes, with speech recently shown to be a valuable biomarker. Yet, its validity across different linguistic groups remains to be seen. To that end, audio data were collected from 96 Danish participants conducting three speech tasks (reading, coughing, sustained vowels). Half of the participants were diagnosed with different levels of COPD and the other half formed a healthy control group. Subsequently, we investigated different baseline models using openSMILE features and learnt x-vector embeddings. We obtained a best accuracy of 67% using openSMILE features and logistic regression. Our findings support the potential of speech-based analysis as a non-invasive, remote, and scalable screening tool as part of future COPD healthcare solutions.
Authors:Kaining Zhang, Catarina Moreira, Pedro Belchior, Gun Lee, Mark Billinghurst, Joaquim Jorge
Abstract:
We introduce \textit{HeadZoom}, a hands-free interaction technique for navigating two-dimensional visual content using head movements. HeadZoom enables fluid zooming and panning using only real-time head tracking. It supports natural control in applications such as map exploration, radiograph inspection, and image browsing, where physical interaction is limited. We evaluated HeadZoom in a within-subjects study comparing three interaction techniques-Static, Tilt Zoom, and Parallel Zoom-across spatial, error, and subjective metrics. Parallel Zoom significantly reduced total head movement compared to Static and Tilt modes. Users reported significantly lower perceived exertion for Parallel Zoom, confirming its suitability for prolonged or precision-based tasks. By minimizing movement demands while maintaining task effectiveness, HeadZoom advances the design of head-based 2D interaction in VR and creates new opportunities for accessible hands-free systems for image exploration.
Authors:Rachel Lin, Bhavya Chopra, Wenjing Lin, Shreya Shankar, Madelon Hulsebos, Aditya G. Parameswaran
Abstract:
Dataset Search -- the process of finding appropriate datasets for a given task -- remains a critical yet under-explored challenge in data science workflows. Assessing dataset suitability for a task (e.g., training a classification model) is a multi-pronged affair that involves understanding: data characteristics (e.g. granularity, attributes, size), semantics (e.g., data semantics, creation goals), and relevance to the task at hand. Present-day dataset search interfaces are restrictive -- users struggle to convey implicit preferences and lack visibility into the search space and result inclusion criteria -- making query iteration challenging. To bridge these gaps, we introduce DataScout to proactively steer users through the process of dataset discovery via -- (i) AI-assisted query reformulations informed by the underlying search space, (ii) semantic search and filtering based on dataset content, including attributes (columns) and granularity (rows), and (iii) dataset relevance indicators, generated dynamically based on the user-specified task. A within-subjects study with 12 participants comparing DataScout to keyword and semantic dataset search reveals that users uniquely employ DataScout's features not only for structured explorations, but also to glean feedback on their search queries and build conceptual models of the search space.
Authors:Alex Liu, Lief Esbenshade, Shawon Sarkar, Victor Tian, Zachary Zhang, Kevin He, Min Sun
Abstract:
The integration of large language models (LLMs) into educational tools has the potential to substantially impact how teachers plan instruction, support diverse learners, and engage in professional reflection. Yet little is known about how educators actually use these tools in practice and how their interactions with AI can be meaningfully studied at scale. This paper presents a human-AI collaborative methodology for large-scale qualitative analysis of over 140,000 educator-AI messages drawn from a generative AI platform used by K-12 teachers. Through a four-phase coding pipeline, we combined inductive theme discovery, codebook development, structured annotation, and model benchmarking to examine patterns of educator engagement and evaluate the performance of LLMs in qualitative coding tasks. We developed a hierarchical codebook aligned with established teacher evaluation frameworks, capturing educators' instructional goals, contextual needs, and pedagogical strategies. Our findings demonstrate that LLMs, particularly Claude 3.5 Haiku, can reliably support theme identification, extend human recognition in complex scenarios, and outperform open-weight models in both accuracy and structural reliability. The analysis also reveals substantive patterns in how educators inquire AI to enhance instructional practices (79.7 percent of total conversations), create or adapt content (76.1 percent), support assessment and feedback loop (46.9 percent), attend to student needs for tailored instruction (43.3 percent), and assist other professional responsibilities (34.2 percent), highlighting emerging AI-related competencies that have direct implications for teacher preparation and professional development. This study offers a scalable, transparent model for AI-augmented qualitative research and provides foundational insights into the evolving role of generative AI in educational practice.
Authors:Noe Zermeño, Cristina Zuheros, Lucas Daniel Del Rosso Calache, Francisco Herrera, Rosana Montes
Abstract:
In recent years, attention has increasingly focused on enhancing user satisfaction with user interfaces, spanning both mobile applications and websites. One fundamental aspect of human-machine interaction is the concept of web usability. In order to assess web usability, the A/B testing technique enables the comparison of data between two designs. Expanding the scope of tests to include the designs being evaluated, in conjunction with the involvement of both real and fictional users, presents a challenge for which few online tools offer support. We propose a methodology for web usability evaluation based on user-centered approaches such as design thinking and linguistic decision-making, named Linguistic Decision-Making for Web Usability Evaluation. This engages people in role-playing scenarios and conducts a number of usability tests, including the widely recognized System Usability Scale. We incorporate the methodology into a decision support system based on A/B testing. We use real users in a case study to assess three Moodle platforms at the University of Guadalajara, Mexico.
Authors:Yu Liu, Leyuan Qu, Hanlei Shi, Di Gao, Yuhua Zheng, Taihao Li
Abstract:
Dynamic Facial Expression Recognition (DFER) aims to identify human emotions from temporally evolving facial movements and plays a critical role in affective computing. While recent vision-language approaches have introduced semantic textual descriptions to guide expression recognition, existing methods still face two key limitations: they often underutilize the subtle emotional cues embedded in generated text, and they have yet to incorporate sufficiently effective mechanisms for filtering out facial dynamics that are irrelevant to emotional expression. To address these gaps, We propose GRACE, Granular Representation Alignment for Cross-modal Emotion recognition that integrates dynamic motion modeling, semantic text refinement, and token-level cross-modal alignment to facilitate the precise localization of emotionally salient spatiotemporal features. Our method constructs emotion-aware textual descriptions via a Coarse-to-fine Affective Text Enhancement (CATE) module and highlights expression-relevant facial motion through a motion-difference weighting mechanism. These refined semantic and visual signals are aligned at the token level using entropy-regularized optimal transport. Experiments on three benchmark datasets demonstrate that our method significantly improves recognition performance, particularly in challenging settings with ambiguous or imbalanced emotion classes, establishing new state-of-the-art (SOTA) results in terms of both UAR and WAR.
Authors:Abhinav Sood, Maria Teresa Llano, Jon McCormack
Abstract:
We present a graphical, node-based system through which users can visually chain generative AI models for creative tasks. Research in the area of chaining LLMs has found that while chaining provides transparency, controllability and guardrails to approach certain tasks, chaining with pre-defined LLM steps prevents free exploration. Using cognitive processes from creativity research as a basis, we create a system that addresses the inherent constraints of chat-based AI interactions. Specifically, our system aims to overcome the limiting linear structure that inhibits creative exploration and ideation. Further, our node-based approach enables the creation of reusable, shareable templates that can address different creative tasks. In a small-scale user study, we find that our graph-based system supports ideation and allows some users to better visualise and think through their writing process when compared to a similar conversational interface. We further discuss the weaknesses and limitations of our system, noting the benefits to creativity that user interfaces with higher complexity can provide for users who can effectively use them.
Authors:Taewook Kim, Matthew Kay, Yuqian Sun, Melissa Roemmele, Max Kreminski, John Joon Young Chung
Abstract:
Human creative ideation involves both exploration of diverse ideas (divergence) and selective synthesis of explored ideas into coherent combinations (convergence). While processes of divergence and convergence are often interleaved and nested, existing AI-powered creativity support tools (CSTs) lack support for sophisticated orchestration of divergence and convergence. We present Reverger, an AI-powered CST that helps users ideate variations of conceptual directions for modifying a story by scaffolding flexible iteration between divergence and convergence. For divergence, our tool enables recursive exploration of alternative high-level directions for modifying a specific part of the original story. For convergence, it allows users to collect explored high-level directions and synthesize them into concrete variations. Users can then iterate between divergence and convergence until they find a satisfactory outcome. A within-subject study revealed that Reverger permitted participants to explore more unexpected and diverse high-level directions than a comparable baseline. Reverger users also felt that they had more fine-grained control and discovered more effort-worthy outcomes.
Authors:Reza Samimi, Aditya Bhattacharya, Lucija Gosak, Gregor Stiglic, Katrien Verbert
Abstract:
Healthcare professionals need effective ways to use, understand, and validate AI-driven clinical decision support systems. Existing systems face two key limitations: complex visualizations and a lack of grounding in scientific evidence. We present an integrated decision support system that combines interactive visualizations with a conversational agent to explain diabetes risk assessments. We propose a hybrid prompt handling approach combining fine-tuned language models for analytical queries with general Large Language Models (LLMs) for broader medical questions, a methodology for grounding AI explanations in scientific evidence, and a feature range analysis technique to support deeper understanding of feature contributions. We conducted a mixed-methods study with 30 healthcare professionals and found that the conversational interactions helped healthcare professionals build a clear understanding of model assessments, while the integration of scientific evidence calibrated trust in the system's decisions. Most participants reported that the system supported both patient risk evaluation and recommendation.
Authors:Ardian Selmonaj, Giacomo Del Rio, Adrian Schneider, Alessandro Antonucci
Abstract:
Achieving mission objectives in a realistic simulation of aerial combat is highly challenging due to imperfect situational awareness and nonlinear flight dynamics. In this work, we introduce a novel 3D multi-agent air combat environment and a Hierarchical Multi-Agent Reinforcement Learning framework to tackle these challenges. Our approach combines heterogeneous agent dynamics, curriculum learning, league-play, and a newly adapted training algorithm. To this end, the decision-making process is organized into two abstraction levels: low-level policies learn precise control maneuvers, while high-level policies issue tactical commands based on mission objectives. Empirical results show that our hierarchical approach improves both learning efficiency and combat performance in complex dogfight scenarios.
Authors:Fevziye Irem Eyiokur, Dogucan Yaman, Hazım Kemal Ekenel, Alexander Waibel
Abstract:
Embodied Reference Understanding requires identifying a target object in a visual scene based on both language instructions and pointing cues. While prior works have shown progress in open-vocabulary object detection, they often fail in ambiguous scenarios where multiple candidate objects exist in the scene. To address these challenges, we propose a novel ERU framework that jointly leverages LLM-based data augmentation, depth-map modality, and a depth-aware decision module. This design enables robust integration of linguistic and embodied cues, improving disambiguation in complex or cluttered environments. Experimental results on two datasets demonstrate that our approach significantly outperforms existing baselines, achieving more accurate and reliable referent detection.
Authors:Florence E. Enock, Helen Z. Margetts, Jonathan Bright
Abstract:
The prevalence of online hate and abuse is a pressing global concern. While tackling such societal harms is a priority for research across the social sciences, it is a difficult task, in part because of the magnitude of the problem. User engagement with reporting mechanisms (flagging) online is an increasingly important part of monitoring and addressing harmful content at scale. However, users may not flag content routinely enough, and when they do engage, they may be biased by group identity and political beliefs. Across five well-powered and pre-registered online experiments, we examine the extent of social bias in the flagging of hate and abuse in four different intergroup contexts: political affiliation, vaccination opinions, beliefs about climate change, and stance on abortion rights. Overall, participants reported abuse reliably, with approximately half of the abusive comments in each study reported. However, a pervasive social bias was present whereby ingroup-directed abuse was consistently flagged to a greater extent than outgroup-directed abuse. Our findings offer new insights into the nature of user flagging online, an understanding of which is crucial for enhancing user intervention against online hate speech and thus ensuring a safer online environment.
Authors:Jenny T. Liang, Titus Barik, Jeffrey Nichols, Eldon Schoop, Ruijia Cheng
Abstract:
Interface agents powered by generative AI models (referred to as "agents") can automate actions based on user commands. An important aspect of developing agents is their user experience (i.e., agent experience). There is a growing need to provide scaffolds for a broader set of individuals beyond AI engineers to prototype agent experiences, since they can contribute valuable perspectives to designing agent experiences. In this work, we explore the affordances agent prototyping systems should offer by conducting a requirements elicitation study with 12 participants with varying experience with agents. We identify key activities in agent experience prototyping and the desired capabilities of agent prototyping systems. We instantiate those capabilities in the AgentBuilder design probe for agent prototyping. We conduct an in situ agent prototyping study with 14 participants using AgentBuilder to validate the design requirements and elicit insights on how developers prototype agents and what their needs are in this process.
Authors:Xuxin Tang, Rehema Abulikemu, Eric Krokos, Kirsten Whitley, Xuan Wang, Chris North
Abstract:
Sensemaking report writing often requires multiple refinements in the iterative process. While Large Language Models (LLMs) have shown promise in generating initial reports based on human visual workspace representations, they struggle to precisely incorporate sequential semantic interactions during the refinement process. We introduce VIS-ReAct, a framework that reasons about newly-added semantic interactions in visual workspaces to steer the LLM for report refinement. VIS-ReAct is a two-agent framework: a primary LLM analysis agent interprets new semantic interactions to infer user intentions and generate refinement planning, followed by an LLM refinement agent that updates reports accordingly. Through case study, VIS-ReAct outperforms baseline and VIS-ReAct (without LLM analysis) on targeted refinement, semantic fidelity, and transparent inference. Results demonstrate that VIS-ReAct better handles various interaction types and granularities while enhancing the transparency of human-LLM collaboration.
Authors:Anku Rani, Valdemar Danry, Paul Pu Liang, Andrew B. Lippman, Pattie Maes
Abstract:
Given the growing prevalence of fake information, including increasingly realistic AI-generated news, there is an urgent need to train people to better evaluate and detect misinformation. While interactions with AI have been shown to durably reduce people's beliefs in false information, it is unclear whether these interactions also teach people the skills to discern false information themselves. We conducted a month-long study where 67 participants classified news headline-image pairs as real or fake, discussed their assessments with an AI system, followed by an unassisted evaluation of unseen news items to measure accuracy before, during, and after AI assistance. While AI assistance produced immediate improvements during AI-assisted sessions (+21\% average), participants' unassisted performance on new items declined significantly by week 4 (-15.3\%). These results indicate that while AI may help immediately, it ultimately degrades long-term misinformation detection abilities.
Authors:Matthew Yue, Zhikun Xu, Vivek Gupta, Thao Ha, Liesal Sharabi, Ben Zhou
Abstract:
Most dating technologies optimize for getting together, not staying together. We present RELATE-Sim, a theory-grounded simulator that models how couples behave at consequential turning points-exclusivity talks, conflict-and-repair episodes, relocations-rather than static traits. Two persona-aligned LLM agents (one per partner) interact under a centralized Scene Master that frames each turning point as a compact set of realistic options, advances the narrative, and infers interpretable state changes and an auditable commitment estimate after each scene. On a longitudinal dataset of 71 couples with two-year follow-ups, simulation-aware predictions outperform a personas-only baseline while surfacing actionable markers (e.g., repair attempts acknowledged, clarity shifts) that explain why trajectories diverge. RELATE-Sim pushes the relationship research's focus from matchmaking to maintenance, providing a transparent, extensible platform for understanding and forecasting long-term relationship dynamics.
Authors:Ardian Selmonaj, Giacomo Del Rio, Adrian Schneider, Alessandro Antonucci
Abstract:
We present a system that enables real-time interaction between human users and agents trained to control fighter jets in simulated 3D air combat scenarios. The agents are trained in a dedicated environment using Multi-Agent Reinforcement Learning. A communication link is developed to allow seamless deployment of trained agents into VR-Forces, a widely used defense simulation tool for realistic tactical scenarios. This integration allows mixed simulations where human-controlled entities engage with intelligent agents exhibiting distinct combat behaviors. Our interaction model creates new opportunities for human-agent teaming, immersive training, and the exploration of innovative tactics in defense contexts.
Authors:Mar Canet Sola, Varvara Guljajeva
Abstract:
"Quantum est in libris" explores the intersection of the archaic and the modern. On one side, there are manuscript materials from the Estonian National Museum's (ERM) more than century-old archive describing the life experiences of Estonian people; on the other side, there is technology that transforms these materials into a dynamic and interactive experience. Connecting technology and cultural heritage is the visitor, who turns texts into inputs for a screen sculpture. Historical narratives are visually brought to life through the contemporary technological language. Because the video AI models we employed, Runway Gen-3 and Gen-4, have not previously interacted with Estonian heritage, we can observe how machines today "read the world" and create future heritage. "Quantum est in libris" introduces an exciting yet unsettling new dimension to the concept of cultural heritage: in a world where data are fluid and interpretations unstable, heritage status becomes fragile. In the digital environment, heritage issues are no longer just about preservation and transmission, but also about representation of the media, machine creativity, and interpretive error. Who or what shapes memory processes and memory spaces, and how?
Authors:Congkai Shen, Siyuan Yu, Yifan Weng, Haoran Ma, Chen Li, Hiroshi Yasuda, James Dallas, Michael Thompson, John Subosits, Tulga Ersal
Abstract:
This study introduces a haptic shared control framework designed to teach human drivers advanced driving skills. In this context, shared control refers to a driving mode where the human driver collaborates with an autonomous driving system to control the steering of a vehicle simultaneously. Advanced driving skills are those necessary to safely push the vehicle to its handling limits in high-performance driving such as racing and emergency obstacle avoidance. Previous research has demonstrated the performance and safety benefits of shared control schemes using both subjective and objective evaluations. However, these schemes have not been assessed for their impact on skill acquisition on complex and demanding tasks. Prior research on long-term skill acquisition either applies haptic shared control to simple tasks or employs other feedback methods like visual and auditory aids. To bridge this gap, this study creates a cyber racing coach framework based on the haptic shared control paradigm and evaluates its performance in helping human drivers acquire high-performance driving skills. The framework introduces (1) an autonomous driving system that is capable of cooperating with humans in a highly performant driving scenario; and (2) a haptic shared control mechanism along with a fading scheme to gradually reduce the steering assistance from autonomy based on the human driver's performance during training. Two benchmarks are considered: self-learning (no assistance) and full assistance during training. Results from a human subject study indicate that the proposed framework helps human drivers develop superior racing skills compared to the benchmarks, resulting in better performance and consistency.
Authors:Alexander Giovannelli, Shakiba Davari, Cherelle Connor, Fionn Murphy, Trey Davis, Haichao Miao, Vuthea Chheang, Brian Giera, Peer-Timo Bremer, Doug A. Bowman
Abstract:
Virtual environments (VEs) empower geographically distributed teams to collaborate on a shared project regardless of time. Existing research has separately investigated collaborations within these VEs at the same time (i.e., synchronous) or different times (i.e., asynchronous). In this work, we highlight the often-overlooked concept of bichronous collaboration and define it as the seamless integration of archived information during a real-time collaborative session. We revisit the time-space matrix of computer-supported cooperative work (CSCW) and reclassify the time dimension as a continuum. We describe a system that empowers collaboration across the temporal states of the time continuum within a VE during remote work. We conducted a user study using the system to discover how the bichronous temporal state impacts the user experience during a collaborative inspection. Findings indicate that the bichronous temporal state is beneficial to collaborative activities for information processing, but has drawbacks such as changed interaction and positioning behaviors in the VE.
Authors:Jina Suh, Lindy Le, Erfan Shayegani, Gonzalo Ramos, Judith Amores, Desmond C. Ong, Mary Czerwinski, Javier Hernandez
Abstract:
Empathy is increasingly recognized as a key factor in human-AI communication, yet conventional approaches to "digital empathy" often focus on simulating internal, human-like emotional states while overlooking the inherently subjective, contextual, and relational facets of empathy as perceived by users. In this work, we propose a human-centered taxonomy that emphasizes observable empathic behaviors and introduce a new dataset, Sense-7, of real-world conversations between information workers and Large Language Models (LLMs), which includes per-turn empathy annotations directly from the users, along with user characteristics, and contextual details, offering a more user-grounded representation of empathy. Analysis of 695 conversations from 109 participants reveals that empathy judgments are highly individualized, context-sensitive, and vulnerable to disruption when conversational continuity fails or user expectations go unmet. To promote further research, we provide a subset of 672 anonymized conversation and provide exploratory classification analysis, showing that an LLM-based classifier can recognize 5 levels of empathy with an encouraging average Spearman $Ï$=0.369 and Accuracy=0.487 over this set. Overall, our findings underscore the need for AI designs that dynamically tailor empathic behaviors to user contexts and goals, offering a roadmap for future research and practical development of socially attuned, human-centered artificial agents.
Authors:Jas Brooks, Javier Hernandez, Mary Czerwinski, Judith Amores
Abstract:
Inspired by the role of chemosignals in conveying emotional states, this paper introduces the Affective Air Quality (AAQ) dataset, a novel dataset collected to explore the potential of volatile odor compound and gas sensor data for non-contact emotion detection. This dataset bridges the gap between the realms of breath \& body odor emission (personal chemical emissions) analysis and established practices in affective computing. Comprising 4-channel gas sensor data from 23 participants at two distances from the body (wearable and desktop), alongside emotional ratings elicited by targeted movie clips, the dataset encapsulates initial groundwork to analyze the correlation between personal chemical emissions and varied emotional responses. The AAQ dataset also provides insights drawn from exit interviews, thereby painting a holistic picture of perceptions regarding air quality monitoring and its implications for privacy. By offering this dataset alongside preliminary attempts at emotion recognition models based on it to the broader research community, we seek to advance the development of odor-based affect recognition models that prioritize user privacy and comfort.
Authors:Sungwon In, Eric Krokos, Kirsten Whitley, Chris North, Yalong Yang
Abstract:
A growing interest in Immersive Analytics (IA) has led to the extension of computational notebooks (e.g., Jupyter Notebook) into an immersive environment to enhance analytical workflows. However, existing solutions rely on the WIMP (windows, icons, menus, pointer) metaphor, which remains impractical for complex data exploration. Although embodied interaction offers a more intuitive alternative, immersive computational notebooks and embodied data exploration systems are implemented as standalone tools. This separation requires analysts to invest considerable effort to transition from one environment to an entirely different one during analytical workflows. To address this, we introduce ICoN, a prototype that facilitates a seamless transition between computational notebooks and embodied data explorations within a unified, fully immersive environment. Our findings reveal that unification improves transition efficiency and intuitiveness during analytical workflows, highlighting its potential for seamless data analysis.
Authors:Sungwon In, Eric Krokos, Kirsten Whitley, Chris North, Yalong Yang
Abstract:
As immersive technologies evolve, immersive computational notebooks offer new opportunities for interacting with code, data, and outputs. However, scaling these environments remains a challenge, particularly when analysts manually arrange large numbers of cells to maintain both execution logic and visual coherence. To address this, we introduce an embodied composition framework, facilitating organizational processes in the context of immersive computational notebooks. To evaluate the effectiveness of the embodied composition framework, we conducted a controlled user study comparing manual and embodied composition frameworks in an organizational process. The results show that embodied composition frameworks significantly reduced user effort and decreased completion time. However, the design of the triggering mechanism requires further refinement. Our findings highlight the potential of embodied composition frameworks to enhance the scalability of the organizational process in immersive computational notebooks.
Authors:Sabrina Patania, Luca Annese, Anna Lambiase, Anita Pellegrini, Tom Foulsham, Azzurra Ruggeri, Silvia Rossi, Silvia Serino, Dimitri Ognibene
Abstract:
Language and embodied perspective taking are essential for human collaboration, yet few computational models address both simultaneously. This work investigates the PerspAct system [1], which integrates the ReAct (Reason and Act) paradigm with Large Language Models (LLMs) to simulate developmental stages of perspective taking, grounded in Selman's theory [2]. Using an extended director task, we evaluate GPT's ability to generate internal narratives aligned with specified developmental stages, and assess how these influence collaborative performance both qualitatively (action selection) and quantitatively (task efficiency). Results show that GPT reliably produces developmentally-consistent narratives before task execution but often shifts towards more advanced stages during interaction, suggesting that language exchanges help refine internal representations. Higher developmental stages generally enhance collaborative effectiveness, while earlier stages yield more variable outcomes in complex contexts. These findings highlight the potential of integrating embodied perspective taking and language in LLMs to better model developmental dynamics and stress the importance of evaluating internal speech during combined linguistic and embodied tasks.
Authors:Florian Lehmann, Krystsina Shauchenka, Daniel Buschek
Abstract:
Current AI writing support tools are largely designed for individuals, complicating collaboration when co-writers must leave the shared workspace to use AI and then communicate and reintegrate results. We propose integrating AI agents directly into collaborative writing environments. Our prototype makes AI use transparent and customisable through two new shared objects: agent profiles and tasks. Agent responses appear in the familiar comment feature. In a user study (N=30), 14 teams worked on writing projects during one week. Interaction logs and interviews show that teams incorporated agents into existing norms of authorship, control, and coordination, rather than treating them as team members. Agent profiles were viewed as personal territory, while created agents and outputs became shared resources. We discuss implications for team-based AI interaction, highlighting opportunities and boundaries for treating AI as a shared resource in collaborative work.
Authors:Ya-Chuan Hsu, Jonathan DeCastro, Andrew Silva, Guy Rosman
Abstract:
In time-critical settings such as assistive driving, assistants often rely on alerts or haptic signals to prompt rapid human attention, but these cues usually leave humans to interpret situations and decide responses independently, introducing potential delays or ambiguity in meaning. Language-based assistive systems can instead provide instructions backed by context, offering more informative guidance. However, current approaches (e.g., social assistive robots) largely prioritize content generation while overlooking critical timing factors such as verbal conveyance duration, human comprehension delays, and subsequent follow-through duration. These timing considerations are crucial in time-critical settings, where even minor delays can substantially affect outcomes. We aim to study this inherent trade-off between timeliness and informativeness by framing the challenge as a sequential decision-making problem using an augmented-state Markov Decision Process. We design a framework combining reinforcement learning and a generated offline taxonomy dataset, where we balance the trade-off while enabling a scalable taxonomy dataset generation pipeline. Empirical evaluation with synthetic humans shows our framework improves success rates by over 40% compared to methods that ignore time delays, while effectively balancing timeliness and informativeness. It also exposes an often-overlooked trade-off between these two factors, opening new directions for optimizing communication in time-critical human-AI assistance.
Authors:Hyunseung Lim, Dasom Choi, DaEun Choi, Sooyohn Nam, Hwajung Hong
Abstract:
Effective feedback, including critique and evaluation, helps designers develop design concepts and refine their ideas, supporting informed decision-making throughout the iterative design process. However, in studio-based design courses, students often struggle to provide feedback due to a lack of confidence and fear of being judged, which limits their ability to develop essential feedback-giving skills. Recent advances in large language models (LLMs) suggest that role-playing with AI agents can let learners engage in multi-turn feedback without the anxiety of external judgment or the time constraints of real-world settings. Yet prior studies have raised concerns that LLMs struggle to behave like real people in role-play scenarios, diminishing the educational benefits of these interactions. Therefore, designing AI-based agents that effectively support learners in practicing and developing intellectual reasoning skills requires more than merely assigning the target persona's personality and role to the agent. By addressing these issues, we present Feed-O-Meter, a novel system that employs carefully designed LLM-based agents to create an environment in which students can practice giving design feedback. The system enables users to role-play as mentors, providing feedback to an AI mentee and allowing them to reflect on how that feedback impacts the AI mentee's idea development process. A user study (N=24) indicated that Feed-O-Meter increased participants' engagement and motivation through role-switching and helped them adjust feedback to be more comprehensible for an AI mentee. Based on these findings, we discuss future directions for designing systems to foster feedback skills in design education.
Authors:Haechang Kim, Hao Chen, Can Li, Jong Min Lee
Abstract:
Explainable Reinforcement Learning (XRL) has emerged as a promising approach in improving the transparency of Reinforcement Learning (RL) agents. However, there remains a gap between complex RL policies and domain experts, due to the limited comprehensibility of XRL results and isolated coverage of current XRL approaches that leave users uncertain about which tools to employ. To address these challenges, we introduce TalkToAgent, a multi-agent Large Language Models (LLM) framework that delivers interactive, natural language explanations for RL policies. The architecture with five specialized LLM agents (Coordinator, Explainer, Coder, Evaluator, and Debugger) enables TalkToAgent to automatically map user queries to relevant XRL tools and clarify an agent's actions in terms of either key state variables, expected outcomes, or counterfactual explanations. Moreover, our approach extends previous counterfactual explanations by deriving alternative scenarios from qualitative behavioral descriptions, or even new rule-based policies. We validated TalkToAgent on quadruple-tank process control problem, a well-known nonlinear control benchmark. Results demonstrated that TalkToAgent successfully mapped user queries into XRL tasks with high accuracy, and coder-debugger interactions minimized failures in counterfactual generation. Furthermore, qualitative evaluation confirmed that TalkToAgent effectively interpreted agent's actions and contextualized their meaning within the problem domain.
Authors:Georgios Makridis, George Fragiadakis, Jorge Oliveira, Tomaz Saraiva, Philip Mavrepis, Georgios Fatouros, Dimosthenis Kyriazis
Abstract:
Current conversational AI systems often provide generic, one-size-fits-all interactions that overlook individual user characteristics and lack adaptive dialogue management. To address this gap, we introduce \textbf{HumAIne-chatbot}, an AI-driven conversational agent that personalizes responses through a novel user profiling framework. The system is pre-trained on a diverse set of GPT-generated virtual personas to establish a broad prior over user types. During live interactions, an online reinforcement learning agent refines per-user models by combining implicit signals (e.g. typing speed, sentiment, engagement duration) with explicit feedback (e.g., likes and dislikes). This profile dynamically informs the chatbot dialogue policy, enabling real-time adaptation of both content and style. To evaluate the system, we performed controlled experiments with 50 synthetic personas in multiple conversation domains. The results showed consistent improvements in user satisfaction, personalization accuracy, and task achievement when personalization features were enabled. Statistical analysis confirmed significant differences between personalized and nonpersonalized conditions, with large effect sizes across key metrics. These findings highlight the effectiveness of AI-driven user profiling and provide a strong foundation for future real-world validation.
Authors:Yanming Xiu, Maria Gorlatova
Abstract:
Augmented reality (AR) enhances user interaction with the real world but also presents vulnerabilities, particularly through Visual Information Manipulation (VIM) attacks. These attacks alter important real-world visual cues, leading to user confusion and misdirected actions. In this demo, we present a hands-on experience using a miniature city setup, where users interact with manipulated AR content via the Meta Quest 3. The demo highlights the impact of VIM attacks on user decision-making and underscores the need for effective security measures in AR systems. Future work includes a user study and cross-platform testing.
Authors:Yifan Xu, Qianwei Wang, Vineet Kamat, Carol Menassa
Abstract:
Indoor built environments like homes and offices often present complex and cluttered layouts that pose significant challenges for individuals who are blind or visually impaired, especially when performing tasks that involve locating and gathering multiple objects. While many existing assistive technologies focus on basic navigation or obstacle avoidance, few systems provide scalable and efficient multi-object search capabilities in real-world, partially observable settings. To address this gap, we introduce OpenGuide, an assistive mobile robot system that combines natural language understanding with vision-language foundation models (VLM), frontier-based exploration, and a Partially Observable Markov Decision Process (POMDP) planner. OpenGuide interprets open-vocabulary requests, reasons about object-scene relationships, and adaptively navigates and localizes multiple target items in novel environments. Our approach enables robust recovery from missed detections through value decay and belief-space reasoning, resulting in more effective exploration and object localization. We validate OpenGuide in simulated and real-world experiments, demonstrating substantial improvements in task success rate and search efficiency over prior methods. This work establishes a foundation for scalable, human-centered robotic assistance in assisted living environments.
Authors:Wonduk Seo, Taesub Shin, Hyunjin An, Dokyun Kim, Seunghyun Lee
Abstract:
Identifying whether two product listings refer to the same Stock Keeping Unit (SKU) is a persistent challenge in ecommerce, especially when explicit identifiers are missing and product names vary widely across platforms. Rule based heuristics and keyword similarity often misclassify products by overlooking subtle distinctions in brand, specification, or bundle configuration. To overcome these limitations, we propose Question to Knowledge (Q2K), a multi agent framework that leverages Large Language Models (LLMs) for reliable SKU mapping. Q2K integrates: (1) a Reasoning Agent that generates targeted disambiguation questions, (2) a Knowledge Agent that resolves them via focused web searches, and (3) a Deduplication Agent that reuses validated reasoning traces to reduce redundancy and ensure consistency. A human in the loop mechanism further refines uncertain cases. Experiments on real world consumer goods datasets show that Q2K surpasses strong baselines, achieving higher accuracy and robustness in difficult scenarios such as bundle identification and brand origin disambiguation. By reusing retrieved reasoning instead of issuing repeated searches, Q2K balances accuracy with efficiency, offering a scalable and interpretable solution for product integration.
Authors:Jazbo Beason, Ruijia Cheng, Eldon Schoop, Jeffrey Nichols
Abstract:
It is challenging to generate the code for a complete user interface using a Large Language Model (LLM). User interfaces are complex and their implementations often consist of multiple, inter-related files that together specify the contents of each screen, the navigation flows between the screens, and the data model used throughout the application. It is challenging to craft a single prompt for an LLM that contains enough detail to generate a complete user interface, and even then the result is frequently a single large and difficult to understand file that contains all of the generated screens. In this paper, we introduce Athena, a prototype application generation environment that demonstrates how the use of shared intermediate representations, including an app storyboard, data model, and GUI skeletons, can help a developer work with an LLM in an iterative fashion to craft a complete user interface. These intermediate representations also scaffold the LLM's code generation process, producing organized and structured code in multiple files while limiting errors. We evaluated Athena with a user study that found 75% of participants preferred our prototype over a typical chatbot-style baseline for prototyping apps.
Authors:Anilkumar Swamy, Vincent Leroy, Philippe Weinzaepfel, Jean-Sébastien Franco, Grégory Rogez
Abstract:
Hand-object 3D reconstruction has become increasingly important for applications in human-robot interaction and immersive AR/VR experiences. A common approach for object-agnostic hand-object reconstruction from RGB sequences involves a two-stage pipeline: hand-object 3D tracking followed by multi-view 3D reconstruction. However, existing methods rely on keypoint detection techniques, such as Structure from Motion (SfM) and hand-keypoint optimization, which struggle with diverse object geometries, weak textures, and mutual hand-object occlusions, limiting scalability and generalization. As a key enabler to generic and seamless, non-intrusive applicability, we propose in this work a robust, keypoint detector-free approach to estimating hand-object 3D transformations from monocular motion video/images. We further integrate this with a multi-view reconstruction pipeline to accurately recover hand-object 3D shape. Our method, named HOSt3R, is unconstrained, does not rely on pre-scanned object templates or camera intrinsics, and reaches state-of-the-art performance for the tasks of object-agnostic hand-object 3D transformation and shape estimation on the SHOWMe benchmark. We also experiment on sequences from the HO3D dataset, demonstrating generalization to unseen object categories.
Authors:Luca Annese, Sabrina Patania, Silvia Serino, Tom Foulsham, Silvia Rossi, Azzurra Ruggeri, Dimitri Ognibene
Abstract:
Recent advances in large language models (LLMs) and reasoning frameworks have opened new possibilities for improving the perspective -taking capabilities of autonomous agents. However, tasks that involve active perception, collaborative reasoning, and perspective taking (understanding what another agent can see or knows) pose persistent challenges for current LLM-based systems. This study investigates the potential of structured examples derived from transformed solution graphs generated by the Fast Downward planner to improve the performance of LLM-based agents within a ReAct framework. We propose a structured solution-processing pipeline that generates three distinct categories of examples: optimal goal paths (G-type), informative node paths (E-type), and step-by-step optimal decision sequences contrasting alternative actions (L-type). These solutions are further converted into ``thought-action'' examples by prompting an LLM to explicitly articulate the reasoning behind each decision. While L-type examples slightly reduce clarification requests and overall action steps, they do not yield consistent improvements. Agents are successful in tasks requiring basic attentional filtering but struggle in scenarios that required mentalising about occluded spaces or weighing the costs of epistemic actions. These findings suggest that structured examples alone are insufficient for robust perspective-taking, underscoring the need for explicit belief tracking, cost modelling, and richer environments to enable socially grounded collaboration in LLM-based agents.
Authors:Sungwon In, Ayush Roy, Eric Krokos, Kirsten Whitley, Chris North, Yalong Yang
Abstract:
Computational notebooks, which integrate code, documentation, tags, and visualizations into a single document, have become increasingly popular for data analysis tasks. With the advent of immersive technologies, these notebooks have evolved into a new paradigm, enabling more interactive and intuitive ways to perform data analysis. An immersive computational notebook, which integrates computational notebooks within an immersive environment, significantly enhances navigation performance with embodied interactions. However, despite recognizing the significance of organizational strategies in the immersive data science process, the organizational strategies for using immersive notebooks remain largely unexplored. In response, our research aims to deepen our understanding of organizations, especially focusing on spatial structures for computational notebooks, and to examine how various execution orders can be visualized in an immersive context. Through an exploratory user study, we found participants preferred organizing notebooks in half-cylindrical structures and engaged significantly more in non-linear analysis. Notably, as the scale of the notebooks increased (i.e., more code cells), users increasingly adopted multiple, concurrent non-linear analytical approaches.
Authors:Vinit Hegiste, Vidit Goyal, Tatjana Legler, Martin Ruskowski
Abstract:
In smart manufacturing environments, accurate and real-time recognition of worker actions is essential for productivity, safety, and human-machine collaboration. While skeleton-based human activity recognition (HAR) offers robustness to lighting, viewpoint, and background variations, most existing approaches rely on centralized datasets, which are impractical in privacy-sensitive industrial scenarios. This paper presents a federated learning (FL) framework for pose-based HAR using a custom skeletal dataset of eight industrially relevant upper-body gestures, captured from five participants and processed using a modified FastPose model. Two temporal backbones, an LSTM and a Transformer encoder, are trained and evaluated under four paradigms: centralized, local (per-client), FL with weighted federated averaging (FedAvg), and federated ensemble learning (FedEnsemble). On the global test set, the FL Transformer improves over centralized training by +12.4 percentage points, with FedEnsemble delivering a +16.3 percentage points gain. On an unseen external client, FL and FedEnsemble exceed centralized accuracy by +52.6 and +58.3 percentage points, respectively. These results demonstrate that FL not only preserves privacy but also substantially enhances cross-user generalization, establishing it as a practical solution for scalable, privacy-aware HAR in heterogeneous industrial settings.
Authors:Yanming Xiu, Joshua Chilukuri, Shunav Sen, Maria Gorlatova
Abstract:
As augmented reality (AR) applications increasingly require 3D content, generative pipelines driven by natural input such as speech offer an alternative to manual asset creation. In this work, we design a modular, edge-assisted architecture that supports both direct text-to-3D and text-image-to-3D pathways, enabling interchangeable integration of state-of-the-art components and systematic comparison of their performance in AR settings. Using this architecture, we implement and evaluate four representative pipelines through an IRB-approved user study with 11 participants, assessing six perceptual and usability metrics across three object prompts. Overall, text-image-to-3D pipelines deliver higher generation quality: the best-performing pipeline, which used FLUX for image generation and Trellis for 3D generation, achieved an average satisfaction score of 4.55 out of 5 and an intent alignment score of 4.82 out of 5. In contrast, direct text-to-3D pipelines excel in speed, with the fastest, Shap-E, completing generation in about 20 seconds. Our results suggest that perceptual quality has a greater impact on user satisfaction than latency, with users tolerating longer generation times when output quality aligns with expectations. We complement subjective ratings with system-level metrics and visual analysis, providing practical insights into the trade-offs of current 3D generation methods for real-world AR deployment.
Authors:Sebastian Cmentowski, Fabian Kievelitz, Jens Krüger
Abstract:
The size of most virtual environments exceeds the tracking space available for physical walking. One solution to this disparity is to extend the available walking range by augmenting users' actual movements. However, the resulting increase in visual flow can easily cause cybersickness. Therefore, we present a novel augmented-walking approach for virtual reality games. Our core concept is a virtual tunnel that spans the entire travel distance when viewed from the outside. However, its interior is only a fraction as long, allowing users to cover the distance by real walking. Whereas the tunnel hides the visual flow from the applied movement acceleration, windows on the tunnel's walls still reveal the actual expedited motion. Our evaluation reveals that our approach avoids cybersickness while enhancing physical activity and preserving presence. We finish our paper with a discussion of the design considerations and limitations of our proposed locomotion technique.
Authors:Andrey Krekhov, Sebastian Cmentowski, Katharina Emmerich, Maic Masuch, Jens Krüger
Abstract:
Virtual reality games are often centered around our feeling of "being there". That presence can be significantly enhanced by supporting physical walking. Although modern virtual reality systems enable room-scale motions, the size of our living rooms is not enough to explore vast virtual environments. Developers bypass that limitation by adding virtual navigation such as teleportation. Although such techniques are intended (or designed) to extend but not replace natural walking, what we often observe are nonmoving players beaming to a location that is one real step ahead. Our navigation metaphor emphasizes physical walking by promoting players into giants on demand to cover large distances. In contrast to flying, our technique proportionally increases the modeled eye distance, preventing cybersickness and creating the feeling of being in a miniature world. Our evaluations underpin a significantly increased presence and walking distance compared to the teleportation approach. Finally, we derive a set of game design implications related to the integration of our technique.
Authors:Haotang Li, Zhenyu Qi, Sen He, Kebin Peng, Sheng Tan, Yili Ren, Tomas Cerny, Jiyue Zhao, Zi Wang
Abstract:
Improper sitting posture during prolonged computer use has become a significant public health concern. Traditional posture monitoring solutions face substantial barriers, including privacy concerns with camera-based systems and user discomfort with wearable sensors. This paper presents UWB-PostureGuard, a privacy-preserving ultra-wideband (UWB) sensing system that advances mobile technologies for preventive health management through continuous, contactless monitoring of ergonomic sitting posture. Our system leverages commercial UWB devices, utilizing comprehensive feature engineering to extract multiple ergonomic sitting posture features. We develop PoseGBDT to effectively capture temporal dependencies in posture patterns, addressing limitations of traditional frame-wise classification approaches. Extensive real-world evaluation across 10 participants and 19 distinct postures demonstrates exceptional performance, achieving 99.11% accuracy while maintaining robustness against environmental variables such as clothing thickness, additional devices, and furniture configurations. Our system provides a scalable, privacy-preserving mobile health solution on existing platforms for proactive ergonomic management, improving quality of life at low costs.
Authors:Mohammed Saqr, Kamila Misiejuk, Sonsoles López-Pernas
Abstract:
While research on human-AI collaboration exists, it mainly examined language learning and used traditional counting methods with little attention to evolution and dynamics of collaboration on cognitively demanding tasks. This study examines human-AI interactions while solving a complex problem. Student-AI interactions were qualitatively coded and analyzed with transition network analysis, sequence analysis and partial correlation networks as well as comparison of frequencies using chi-square and Person-residual shaded Mosaic plots to map interaction patterns, their evolution, and their relationship to problem complexity and student performance. Findings reveal a dominant Instructive pattern with interactions characterized by iterative ordering rather than collaborative negotiation. Oftentimes, students engaged in long threads that showed misalignment between their prompts and AI output that exemplified a lack of synergy that challenges the prevailing assumptions about LLMs as collaborative partners. We also found no significant correlations between assignment complexity, prompt length, and student grades suggesting a lack of cognitive depth, or effect of problem difficulty. Our study indicates that the current LLMs, optimized for instruction-following rather than cognitive partnership, compound their capability to act as cognitively stimulating or aligned collaborators. Implications for designing AI systems that prioritize cognitive alignment and collaboration are discussed.
Authors:Chao Wang, Michael Gienger, Fan Zhang
Abstract:
Expressive behaviors in robots are critical for effectively conveying their emotional states during interactions with humans. In this work, we present a framework that autonomously generates realistic and diverse robotic emotional expressions based on expert human demonstrations captured in Mixed Reality (MR). Our system enables experts to teleoperate a virtual robot from a first-person perspective, capturing their facial expressions, head movements, and upper-body gestures, and mapping these behaviors onto corresponding robotic components including eyes, ears, neck, and arms. Leveraging a flow-matching-based generative process, our model learns to produce coherent and varied behaviors in real-time in response to moving objects, conditioned explicitly on given emotional states. A preliminary test validated the effectiveness of our approach for generating autonomous expressions.
Authors:Runlin Duan, Chenfei Zhu, Yuzhao Chen, Dizhi Ma, Jingyu Shi, Ziyi Liu, Karthik Ramani
Abstract:
Conceptual product design requires designers to explore the design space of visual and functional concepts simultaneously. Sketching has long been adopted to empower concept exploration. However, current sketch-based design tools mostly emphasize visual design using emerging techniques. We present SketchConcept, a design support tool that decomposes design concepts into visual representations and functionality of concepts using sketches and textual descriptions. We propose a function-to-visual mapping workflow that maps the function descriptions generated by a Large Language Model to a component of the concept produced by image Generative Artificial Intelligence(GenAI). The function-to-visual mapping allows our system to leverage multimodal GenAI to decompose, generate, and edit the design concept to satisfy the overall function and behavior. We present multiple use cases enabled by SketchConcept to validate the workflow. Finally, we evaluated the efficacy and usability of our system with a two-session user study.
Authors:Runlin Duan, Yuzhao Chen, Rahul Jain, Yichen Hu, Jingyu Shi, Karthik Ramani
Abstract:
Generative AI (GenAI) has significantly advanced the ease and flexibility of image creation. However, it remains a challenge to precisely control spatial compositions, including object arrangement and scene conditions. To bridge this gap, we propose Canvas3D, an interactive system leveraging a 3D engine to enable precise spatial manipulation for image generation. Upon user prompt, Canvas3D automatically converts textual descriptions into interactive objects within a 3D engine-driven virtual canvas, empowering direct and precise spatial configuration. These user-defined arrangements generate explicit spatial constraints that guide generative models in accurately reflecting user intentions in the resulting images. We conducted a closed-end comparative study between Canvas3D and a baseline system. And an open-ended study to evaluate our system "in the wild". The result indicates that Canvas3D outperforms the baseline on spatial control, interactivity, and overall user experience.
Authors:Sanjana Gautam, Mohit Chandra, Ankolika De, Tatiana Chakravorti, Girik Malik, Munmun De Choudhury
Abstract:
Lived experiences fundamentally shape how individuals interact with AI systems, influencing perceptions of safety, trust, and usability. While prior research has focused on developing techniques to emulate human preferences, and proposed taxonomies to categorize risks (such as psychological harms and algorithmic biases), these efforts have provided limited systematic understanding of lived human experiences or actionable strategies for embedding them meaningfully into the AI development lifecycle. This work proposes a framework for meaningfully integrating lived experience into the design and evaluation of AI systems. We synthesize interdisciplinary literature across lived experience philosophy, human-centered design, and human-AI interaction, arguing that centering lived experience can lead to models that more accurately reflect the retrospective, emotional, and contextual dimensions of human cognition. Drawing from a wide body of work across psychology, education, healthcare, and social policy, we present a targeted taxonomy of lived experiences with specific applicability to AI systems. To ground our framework, we examine three application domains (i) education, (ii) healthcare, and (iii) cultural alignment, illustrating how lived experience informs user goals, system expectations, and ethical considerations in each context. We further incorporate insights from AI system operators and human-AI partnerships to highlight challenges in responsibility allocation, mental model calibration, and long-term system adaptation. We conclude with actionable recommendations for developing experience-centered AI systems that are not only technically robust but also empathetic, context-aware, and aligned with human realities. This work offers a foundation for future research that bridges technical development with the lived experiences of those impacted by AI systems.
Authors:Serena Tardelli, Lorenzo Alvisi, Lorenzo Cima, Stefano Cresci, Maurizio Tesconi
Abstract:
Emoji reactions are a frequently used feature of messaging platforms. Prior work mainly interpreted emojis as indicators of emotional resonance or user sentiment. However, emoji reactions may instead reflect broader social dynamics. Here, we investigate the communicative function of emoji reactions on Telegram by analyzing the relationship between the emotional and rhetorical content of messages and the emoji reactions they receive. We collect and analyze over 650k Telegram messages that received at least one emoji reaction. We annotate each message with sentiment, emotion, persuasion strategy, and speech act labels, and infer the sentiment and emotion of emoji reactions using both lexicons and large languages. We find a systematic mismatch between message sentiment and reaction sentiment, with positive reactions dominating even when the message is neutral or negative. We show that this pattern remains consistent across rhetorical strategies and emotional tones, suggesting that emoji reactions may signal a degree of social approval rather than reflecting emotional resonance. Finally, we shed light on the communicative strategies that predict greater emoji engagement. These findings have methodological implications for sentiment analysis, as interpreting emoji reactions as direct proxies for emotional response may be misleading.
Authors:Maurice Koch, Nelusa Pathmanathan, Daniel Weiskopf, Kuno Kurzhals
Abstract:
An essential task in analyzing collaborative design processes, such as those that are part of workshops in design studies, is identifying design outcomes and understanding how the collaboration between participants formed the results and led to decision-making. However, findings are typically restricted to a consolidated textual form based on notes from interviews or observations. A challenge arises from integrating different sources of observations, leading to large amounts and heterogeneity of collected data. To address this challenge we propose a practical, modular, and adaptable framework of workshop setup, multimodal data acquisition, AI-based artifact extraction, and visual analysis. Our interactive visual analysis system, reCAPit, allows the flexible combination of different modalities, including video, audio, notes, or gaze, to analyze and communicate important workshop findings. A multimodal streamgraph displays activity and attention in the working area, temporally aligned topic cards summarize participants' discussions, and drill-down techniques allow inspecting raw data of included sources. As part of our research, we conducted six workshops across different themes ranging from social science research on urban planning to a design study on band-practice visualization. The latter two are examined in detail and described as case studies. Further, we present considerations for planning workshops and challenges that we derive from our own experience and the interviews we conducted with workshop experts. Our research extends existing methodology of collaborative design workshops by promoting data-rich acquisition of multimodal observations, combined AI-based extraction and interactive visual analysis, and transparent dissemination of results.
Authors:Hyunjn An, Yongwon Kim, Wonduk Seo, Joonil Park, Daye Kang, Changhoon Oh, Dokyun Kim, Seunghyun Lee
Abstract:
While many tools are available for designing AI, non-experts still face challenges in clearly expressing their intent and managing system complexity. We introduce AIAP, a no-code platform that integrates natural language input with visual workflows. AIAP leverages a coordinated multi-agent system to decompose ambiguous user instructions into modular, actionable steps, hidden from users behind a unified interface. A user study involving 32 participants showed that AIAP's AI-generated suggestions, modular workflows, and automatic identification of data, actions, and context significantly improved participants' ability to develop services intuitively. These findings highlight that natural language-based visual programming significantly reduces barriers and enhances user experience in AI service design.
Authors:Julian Acosta, Scott Adams, Julius Kernbach, Romain Hardy, Sung Eun Kim, Luyang Luo, Xiaoman Zhang, Shreya Johri, Mohammed Baharoon, Pranav Rajpurkar
Abstract:
We developed a voice-driven artificial intelligence (AI) system that guides anyone - from paramedics to family members - through expert-level stroke evaluations using natural conversation, while also enabling smartphone video capture of key examination components for documentation and potential expert review. This addresses a critical gap in emergency care: current stroke recognition by first responders is inconsistent and often inaccurate, with sensitivity for stroke detection as low as 58%, causing life-threatening delays in treatment. Three non-medical volunteers used our AI system to assess ten simulated stroke patients, including cases with likely large vessel occlusion (LVO) strokes and stroke-like conditions, while we measured diagnostic accuracy, completion times, user confidence, and expert physician review of the AI-generated reports. The AI system correctly identified 84% of individual stroke signs and detected 75% of likely LVOs, completing evaluations in just over 6 minutes. Users reported high confidence (median 4.5/5) and ease of use (mean 4.67/5). The system successfully identified 86% of actual strokes but also incorrectly flagged 2 of 3 non-stroke cases as strokes. When an expert physician reviewed the AI reports with videos, they identified the correct diagnosis in 100% of cases, but felt confident enough to make preliminary treatment decisions in only 40% of cases due to observed AI errors including incorrect scoring and false information. While the current system's limitations necessitate human oversight, ongoing rapid advancements in speech-to-speech AI models suggest that future versions are poised to enable highly accurate assessments. Achieving human-level voice interaction could transform emergency medical care, putting expert-informed assessment capabilities in everyone's hands.
Authors:Zeyu Yan, SuHwan Hong, Josiah Hester, Tingyu Cheng, Huaishu Peng
Abstract:
We introduce DissolvPCB, an electronic prototyping technique for fabricating fully recyclable printed circuit board assemblies (PCBAs) using affordable FDM 3D printing, with polyvinyl alcohol (PVA) as a water-soluble substrate and eutectic gallium-indium (EGaIn) as the conductive material. When obsolete, the PCBA can be easily recycled by immersing it in water: the PVA dissolves, the EGaIn re-forms into a liquid metal bead, and the electronic components are recovered. These materials can then be reused to fabricate a new PCBA.
We present the DissolvPCB workflow, characterize its design parameters, evaluate the performance of circuits produced with it, and quantify its environmental impact through a lifecycle assessment (LCA) comparing it to conventional CNC-milled FR-4 boards. We further develop a software plugin that automatically converts PCB design files into 3D-printable circuit substrate models. To demonstrate the capabilities of DissolvPCB, we fabricate and recycle three functional prototypes: a Bluetooth speaker featuring a double-sided PCB, a finger fidget toy with a 3D circuit topology, and a shape-changing gripper enabled by Joule-heat-driven 4D printing. The paper concludes with a discussion of current technical limitations and opportunities for future directions.
Authors:Sabrina Patania, Luca Annese, Cansu Koyuturk, Azzurra Ruggeri, Dimitri Ognibene
Abstract:
Large Language Models (LLMs) have demonstrated remarkable capabilities in processing extensive offline datasets. However, they often face challenges in acquiring and integrating complex, knowledge online. Traditional AI training paradigms, predominantly based on supervised learning or reinforcement learning, mirror a 'Piagetian' model of independent exploration. These approaches typically rely on large datasets and sparse feedback signals, limiting the models' ability to learn efficiently from interactions. Drawing inspiration from Vygotsky's sociocultural theory, this study explores the potential of socially mediated learning paradigms to address these limitations.
We introduce a dynamic environment, termed the 'AI Social Gym', where an AI learner agent engages in dyadic pedagogical dialogues with knowledgeable AI teacher agents. These interactions emphasize external, structured dialogue as a core mechanism for knowledge acquisition, contrasting with methods that depend solely on internal inference or pattern recognition.
Our investigation focuses on how different pedagogical strategies impact the AI learning process in the context of ontology acquisition. Empirical results indicate that such dialogic approaches-particularly those involving mixed-direction interactions combining top-down explanations with learner-initiated questioning-significantly enhance the LLM's ability to acquire and apply new knowledge, outperforming both unidirectional instructional methods and direct access to structured knowledge, formats typically present in training datasets.
These findings suggest that integrating pedagogical and psychological insights into AI and robot training can substantially improve post-training knowledge acquisition and response quality. This approach offers a complementary pathway to existing strategies like prompt engineering
Authors:Hongxu Liu, Xinyu Chen, Haoyang Zheng, Manyi Li, Zhenfan Liu, Fumeng Yang, Yunhai Wang, Changhe Tu, Qiong Zeng
Abstract:
Recovering a continuous colormap from a single 2D scalar field visualization can be quite challenging, especially in the absence of a corresponding color legend. In this paper, we propose a novel colormap recovery approach that extracts the colormap from a color-encoded 2D scalar field visualization by simultaneously predicting the colormap and underlying data using a decoupling-and-reconstruction strategy. Our approach first separates the input visualization into colormap and data using a decoupling module, then reconstructs the visualization with a differentiable color-mapping module. To guide this process, we design a reconstruction loss between the input and reconstructed visualizations, which serves both as a constraint to ensure strong correlation between colormap and data during training, and as a self-supervised optimizer for fine-tuning the predicted colormap of unseen visualizations during inferencing. To ensure smoothness and correct color ordering in the extracted colormap, we introduce a compact colormap representation using cubic B-spline curves and an associated color order loss. We evaluate our method quantitatively and qualitatively on a synthetic dataset and a collection of real-world visualizations from the VIS30K dataset. Additionally, we demonstrate its utility in two prototype applications -- colormap adjustment and colormap transfer -- and explore its generalization to visualizations with color legends and ones encoded using discrete color palettes.
Authors:Mahiro Ozaki, Li Chen, Shotaro Naganuma, Valdemar Švábenský, Fumiya Okubo, Atsushi Shimada
Abstract:
This study proposes and evaluates the PAnoramic Learning Map (PALM), a learning analytics (LA) dashboard designed to address the scalability challenges of LA by integrating curriculum-level information. Traditional LA research has predominantly focused on individual courses or learners and often lacks a framework that considers the relationships between courses and the long-term trajectory of learning. To bridge this gap, PALM was developed to integrate multilayered educational data into a curriculum map, enabling learners to intuitively understand their learning records and academic progression. We conducted a system evaluation to assess PALM's effectiveness in two key areas: (1) its impact on students' awareness of their learning behaviors, and (2) its comparative performance against existing systems. The results indicate that PALM enhances learners' awareness of study planning and reflection, particularly by improving perceived behavioral control through the visual presentation of individual learning histories and statistical trends, which clarify the links between learning actions and outcomes. Although PALM requires ongoing refinement as a system, it received significantly higher evaluations than existing systems in terms of visual appeal and usability. By serving as an information resource with previously inaccessible insights, PALM enhances self-regulated learning and engagement, representing a significant step beyond conventional LA toward a comprehensive and scalable approach.
Authors:Sue Min Cho, Alexander Do, Russell H. Taylor, Mathias Unberath
Abstract:
As surgery embraces digital transformation--integrating sophisticated imaging, advanced algorithms, and robotics to support and automate complex sub-tasks--human judgment of system correctness remains a vital safeguard for patient safety. This shift introduces new "operator-type" roles tasked with verifying complex algorithmic outputs, particularly at critical junctures of the procedure, such as the intermediary check before drilling or implant placement. A prime example is 2D/3D registration, a key enabler of image-based surgical navigation that aligns intraoperative 2D images with preoperative 3D data. Although registration algorithms have advanced significantly, they occasionally yield inaccurate results. Because even small misalignments can lead to revision surgery or irreversible surgical errors, there is a critical need for robust quality assurance. Current visualization-based strategies alone have been found insufficient to enable humans to reliably detect 2D/3D registration misalignments. In response, we propose the first artificial intelligence (AI) framework trained specifically for 2D/3D registration quality verification, augmented by explainability features that clarify the model's decision-making. Our explainable AI (XAI) approach aims to enhance informed decision-making for human operators by providing a second opinion together with a rationale behind it. Through algorithm-centric and human-centered evaluations, we systematically compare four conditions: AI-only, human-only, human-AI, and human-XAI. Our findings reveal that while explainability features modestly improve user trust and willingness to override AI errors, they do not exceed the standalone AI in aggregate performance. Nevertheless, future work extending both the algorithmic design and the human-XAI collaboration elements holds promise for more robust quality assurance of 2D/3D registration.
Authors:Chen Chen, Jianing Yin, Jiannong Cao, Zhiyuan Wen, Mingjin Zhang, Weixun Gao, Xiang Wang, Haihua Shu
Abstract:
Postoperative follow-up plays a crucial role in monitoring recovery and identifying complications. However, traditional approaches, typically involving bedside interviews and manual documentation, are time-consuming and labor-intensive. Although existing digital solutions, such as web questionnaires and intelligent automated calls, can alleviate the workload of nurses to a certain extent, they either deliver an inflexible scripted interaction or face private information leakage issues. To address these limitations, this paper introduces FollowUpBot, an LLM-powered edge-deployed robot for postoperative care and monitoring. It allows dynamic planning of optimal routes and uses edge-deployed LLMs to conduct adaptive and face-to-face conversations with patients through multiple interaction modes, ensuring data privacy. Moreover, FollowUpBot is capable of automatically generating structured postoperative follow-up reports for healthcare institutions by analyzing patient interactions during follow-up. Experimental results demonstrate that our robot achieves high coverage and satisfaction in follow-up interactions, as well as high report generation accuracy across diverse field types. The demonstration video is available at https://www.youtube.com/watch?v=_uFgDO7NoK0.
Authors:Kaixin Ji, Danula Hettiachchi, Falk Scholer, Flora D. Salim, Damiano Spina
Abstract:
Information processing tasks involve complex cognitive mechanisms that are shaped by various factors, including individual goals, prior experience, and system environments. Understanding such behaviors requires a sophisticated and personalized data capture of how one interacts with modern information systems (e.g., web search engines). Passive sensors, such as wearables, capturing physiological and behavioral data, have the potential to provide solutions in this context. This paper presents a novel dataset, SenseSeek, designed to evaluate the effectiveness of consumer-grade sensors in a complex information processing scenario: searching via systems (e.g., search engines), one of the common strategies users employ for information seeking. The SenseSeek dataset comprises data collected from 20 participants, 235 trials of the stimulated search process, 940 phases of stages in the search process, including the realization of Information Need (IN), Query Formulation (QF), Query Submission by Typing (QS-T) or Speaking (QS-S), and Relevance Judgment by Reading (RJ-R) or Listening (RJ-L). The data includes Electrodermal Activities (EDA), Electroencephalogram (EEG), PUPIL, GAZE, and MOTION data, which were captured using consumer-grade sensors. It also contains 258 features extracted from the sensor data, the gaze-annotated screen recordings, and task responses. We validate the usefulness of the dataset by providing baseline analysis on the impacts of different cognitive intents and interaction modalities on the sensor data, and effectiveness of the data in discriminating the search stages. To our knowledge, SenseSeek is the first dataset that characterizes the multiple stages involved in information seeking with physiological signals collected from multiple sensors. We hope this dataset can serve as a reference for future research on information-seeking behaviors.
Authors:Lorenzo Mannocci, Stefano Cresci, Matteo Magnani, Anna Monreale, Maurizio Tesconi
Abstract:
Coordinated online behavior, which spans from beneficial collective actions to harmful manipulation such as disinformation campaigns, has become a key focus in digital ecosystem analysis. Traditional methods often rely on monomodal approaches, focusing on single types of interactions like co-retweets or co-hashtags, or consider multiple modalities independently of each other. However, these approaches may overlook the complex dynamics inherent in multimodal coordination. This study compares different ways of operationalizing the detection of multimodal coordinated behavior. It examines the trade-off between weakly and strongly integrated multimodal models, highlighting the balance between capturing broader coordination patterns and identifying tightly coordinated behavior. By comparing monomodal and multimodal approaches, we assess the unique contributions of different data modalities and explore how varying implementations of multimodality impact detection outcomes. Our findings reveal that not all the modalities provide distinct insights, but that with a multimodal approach we can get a more comprehensive understanding of coordination dynamics. This work enhances the ability to detect and analyze coordinated online behavior, offering new perspectives for safeguarding the integrity of digital platforms.
Authors:Xiaolin Wen, Qishuang Fu, Shuangyue Han, Yichen Guo, Joseph K. Liu, Yong Wang
Abstract:
Graph querying is the process of retrieving information from graph data using specialized languages (e.g., Cypher), often requiring programming expertise. Visual Graph Querying (VGQ) streamlines this process by enabling users to construct and execute queries via an interactive interface without resorting to complex coding. However, current VGQ tools only allow users to construct simple and specific query graphs, limiting users' ability to interactively express their query intent, especially for underspecified query intent. To address these limitations, we propose Envisage, an interactive visual graph querying system to enhance the expressiveness of VGQ in complex query scenarios by supporting intuitive graph structure construction and flexible parameterized rule specification. Specifically, Envisage comprises four stages: Query Expression allows users to interactively construct graph queries through intuitive operations; Query Verification enables the validation of constructed queries via rule verification and query instantiation; Progressive Query Execution can progressively execute queries to ensure meaningful querying results; and Result Analysis facilitates result exploration and interpretation. To evaluate Envisage, we conducted two case studies and in-depth user interviews with 14 graph analysts. The results demonstrate its effectiveness and usability in constructing, verifying, and executing complex graph queries.
Authors:Changjian Chen, Pengcheng Wang, Fei Lyu, Zhuo Tang, Li Yang, Long Wang, Yong Cai, Feng Yu, Kenli Li
Abstract:
Hybrid rice breeding crossbreeds different rice lines and cultivates the resulting hybrids in fields to select those with desirable agronomic traits, such as higher yields. Recently, genomic selection has emerged as an efficient way for hybrid rice breeding. It predicts the traits of hybrids based on their genes, which helps exclude many undesired hybrids, largely reducing the workload of field cultivation. However, due to the limited accuracy of genomic prediction models, breeders still need to combine their experience with the models to identify regulatory genes that control traits and select hybrids, which remains a time-consuming process. To ease this process, in this paper, we proposed a visual analysis method to facilitate interactive hybrid rice breeding. Regulatory gene identification and hybrid selection naturally ensemble a dual-analysis task. Therefore, we developed a parametric dual projection method with theoretical guarantees to facilitate interactive dual analysis. Based on this dual projection method, we further developed a gene visualization and a hybrid visualization to verify the identified regulatory genes and hybrids. The effectiveness of our method is demonstrated through the quantitative evaluation of the parametric dual projection method, identified regulatory genes and desired hybrids in the case study, and positive feedback from breeders.
Authors:Diana Romero, Yasra Chandio, Fatima Anwar, Salma Elmalaki
Abstract:
Understanding how teams coordinate, share work, and negotiate roles in immersive environments is critical for designing effective mixed-reality (MR) applications that support real-time collaboration. However, existing methods either rely on external cameras and offline annotation or focus narrowly on single modalities, limiting their validity and applicability. To address this, we present a novel group interaction sensing toolkit (GIST), a deployable system that passively captures multi-modal interaction data, such as speech, gaze, and spatial proximity from commodity MR headset's sensors and automatically derives both overall static interaction networks and dynamic moment-by-moment behavior patterns. We evaluate GIST with a human subject study with 48 participants across 12 four-person groups performing an open-ended image-sorting task in MR. Our analysis shows strong alignment between the identified behavior modes and shifts in interaction network structure, confirming that momentary changes in speech, gaze, and proximity data are observable through the sensor data.
Authors:Shaolun Ruan, Rui Sheng, Xiaolin Wen, Jiachen Wang, Tianyi Zhang, Yong Wang, Tim Dwyer, Jiannan Li
Abstract:
Design studies aim to create visualization solutions for real-world problems of different application domains. Recently, the emergence of large language models (LLMs) has introduced new opportunities to enhance the design study process, providing capabilities such as creative problem-solving, data handling, and insightful analysis. However, despite their growing popularity, there remains a lack of systematic understanding of how LLMs can effectively assist researchers in visualization-specific design studies. In this paper, we conducted a multi-stage qualitative study to fill this gap, involving 30 design study researchers from diverse backgrounds and expertise levels. Through in-depth interviews and carefully-designed questionnaires, we investigated strategies for utilizing LLMs, the challenges encountered, and the practices used to overcome them. We further compiled and summarized the roles that LLMs can play across different stages of the design study process. Our findings highlight practical implications to inform visualization practitioners, and provide a framework for leveraging LLMs to enhance the design study process in visualization research.
Authors:Haocan Sun, Weizi Liu, Di Wu, Guoming Yu, Mike Yao
Abstract:
Trust is one of the most important factors shaping whether and how people adopt and rely on artificial intelligence (AI). Yet most existing studies measure trust in terms of functionality, focusing on whether a system is reliable, accurate, or easy to use, while giving less attention to the social and emotional dimensions that are increasingly relevant for today's generative AI (GenAI) systems. These systems do not just process information; they converse, respond, and collaborate with users, blurring the line between tool and partner. In this study, we introduce and validate the Human-AI Trust Scale (HAITS), a new measure designed to capture both the rational and relational aspects of trust in GenAI. Drawing on prior trust theories, qualitative interviews, and two waves of large-scale surveys in China and the United States, we used exploratory (n = 1,546) and confirmatory (n = 1,426) factor analyses to identify four key dimensions of trust: Affective Trust, Competence Trust, Benevolence & Integrity, and Perceived Risk. We then applied latent profile analysis to classify users into six distinct trust profiles, revealing meaningful differences in how affective-competence trust and trust-distrust frameworks coexist across individuals and cultures. Our findings offer a validated, culturally sensitive tool for measuring trust in GenAI and provide new insight into how trust evolves in human-AI interaction. By integrating instrumental and relational perspectives of trust, this work lays the foundation for more nuanced research and design of trustworthy AI systems.
Authors:Haocan Sun, Di Wu, Weizi Liu, Guoming Yu, Mike Yao
Abstract:
Concerns over the potential over-pathologization of generative AI (GenAI) use and the lack of conceptual clarity surrounding GenAI addiction call for empirical tools and theoretical refinement. This study developed and validated the PUGenAIS-9 (Problematic Use of Generative Artificial Intelligence Scale-9 items) and examined whether PUGenAIS reflects addiction-like patterns under the Internet Gaming Disorder (IGD) framework. Using samples from China and the United States (N = 1,508), we conducted confirmatory factor analysis and identified a robust 31-item structure across nine IGD-based dimensions. We then derived the PUGenAIS-9 by selecting the highest-loading items from each dimension and validated its structure in an independent sample (N = 1,426). Measurement invariance tests confirmed its stability across nationality and gender. Person-centered (latent profile analysis) and variable-centered (network analysis) approaches revealed a 5-10% prevalence rate, a symptom network structure similar to IGD, and predictive factors related to psychological distress and functional impairment. These findings indicate that PUGenAI shares features of the emotionally vulnerable subtype of IGD rather than the competence-based type. These results support using PUGenAIS-9 to identify problematic GenAI use and show the need to rethink digital addiction with an ICD (infrastructures, content, and device) model. This keeps addiction research responsive to new media while avoiding over-pathologizing.
Authors:Jinyan Su, Claire Cardie, Jennifer Healey
Abstract:
Multi-hop question answering is a challenging task for both large language models (LLMs) and humans, as it requires recognizing when multi-hop reasoning is needed, followed by reading comprehension, logical reasoning, and knowledge integration. To better understand how humans might collaborate effectively with AI, we evaluate the performance of crowd workers on these individual reasoning subtasks. We find that while humans excel at knowledge integration (97\% accuracy), they often fail to recognize when a question requires multi-hop reasoning (67\% accuracy). Participants perform reasonably well on both single-hop and multi-hop QA (84\% and 80\% accuracy, respectively), but frequently make semantic mistakes--for example, answering "when" an event happened when the question asked "where." These findings highlight the importance of designing AI systems that complement human strengths while compensating for common weaknesses.
Authors:Zeya Chen, Jianing Wen, Ruth Schmidt, Yaxing Yao, Toby Jia-Jun Li, Tianshi Li
Abstract:
UX professionals routinely conduct design reviews, yet privacy concerns are often overlooked -- not only due to limited tools, but more critically because of low intrinsic motivation. Limited privacy knowledge, weak empathy for unexpectedly affected users, and low confidence in identifying harms make it difficult to address risks. We present PrivacyMotiv, an LLM-powered system that supports privacy-oriented design diagnosis by generating speculative personas with UX user journeys centered on individuals vulnerable to privacy risks. Drawing on narrative strategies, the system constructs relatable and attention-drawing scenarios that show how ordinary design choices may cause unintended harms, expanding the scope of privacy reflection in UX. In a within-subjects study with professional UX practitioners (N=16), we compared participants' self-proposed methods with PrivacyMotiv across two privacy review tasks. Results show significant improvements in empathy, intrinsic motivation, and perceived usefulness. This work contributes a promising privacy review approach which addresses the motivational barriers in privacy-aware UX.
Authors:Kristoffer Christensen, Bo Nørregaard Jørgensen, Zheng Grace Ma
Abstract:
High-quality data is a prerequisite for training reliable Artificial Intelligence (AI) models in the energy domain. In district heating networks, sensor and metering data often suffer from noise, missing values, and temporal inconsistencies, which can significantly degrade model performance. This paper presents a systematic approach for evaluating and improving data quality using visual diagnostics, implemented through an interactive web-based dashboard. The dashboard employs Python-based visualization techniques, including time series plots, heatmaps, box plots, histograms, correlation matrices, and anomaly-sensitive KPIs such as skewness and anomaly detection based on the modified z-scores. These tools al-low human experts to inspect and interpret data anomalies, enabling a human-in-the-loop strategy for data quality assessment. The methodology is demonstrated on a real-world dataset from a Danish district heating provider, covering over four years of hourly data from nearly 7000 meters. The findings show how visual analytics can uncover systemic data issues and, in the future, guide data cleaning strategies that enhance the accuracy, stability, and generalizability of Long Short-Term Memory and Gated Recurrent Unit models for heat demand forecasting. The study contributes to a scalable, generalizable framework for visual data inspection and underlines the critical role of data quality in AI-driven energy management systems.
Authors:Jared Hwang, Chu Li, Hanbyul Kang, Maryam Hosseini, Jon E. Froehlich
Abstract:
Accessible parking is critical for people with disabilities (PwDs), allowing equitable access to destinations, independent mobility, and community participation. Despite mandates, there has been no large-scale investigation of the quality or allocation of disability parking in the US nor significant research on PwD perspectives and uses of disability parking. In this paper, we first present a semi-structured interview study with 11 PwDs to advance understanding of disability parking uses, concerns, and relevant technology tools. We find that PwDs often adapt to disability parking challenges according to their personal mobility needs and value reliable, real-time accessibility information. Informed by these findings, we then introduce a new deep learning pipeline, called AccessParkCV, and parking dataset for automatically detecting disability parking and inferring quality characteristics (e.g., width) from orthorectified aerial imagery. We achieve a micro-F1=0.89 and demonstrate how our pipeline can support new urban analytics and end-user tools. Together, we contribute new qualitative understandings of disability parking, a novel detection pipeline and open dataset, and design guidelines for future tools.
Authors:Adnan Abbas, Caleb Wohn, Arnav Jagtap, Eugenia H Rho, Young-Ho Kim, Sang Won Lee
Abstract:
Conversational agents have been studied as tools to scaffold planning and self-reflection for productivity and well-being. While prior work has demonstrated positive outcomes, we still lack a clear understanding of what drives these results and how users behave and communicate with agents that act as coaches rather than assistants. Such understanding is critical for designing interactions in which agents foster meaningful behavioral change. We conducted a 14-day longitudinal study with 12 participants using a proactive agent that initiated regular check-ins to support daily planning and reflection. Our findings reveal diverse interaction patterns: participants accepted or negotiated suggestions, developed shared mental models, reported progress, and at times resisted or disengaged. We also identified problematic aspects of the agent's behavior, including rigidity, premature turn-taking, and overpromising. Our work contributes to understanding how people interact with a proactive, coach-like agent and offers design considerations for facilitating effective behavioral change.
Authors:Michele Romani, Francesco Paissan, Andrea FossÃ, Elisabetta Farella
Abstract:
Brain-Computer Interfaces (BCIs) suffer from high inter-subject variability and limited labeled data, often requiring lengthy calibration phases. In this work, we present an end-to-end approach that explicitly models the subject dependency using lightweight convolutional neural networks (CNNs) conditioned on the subject's identity. Our method integrates hyperparameter optimization strategies that prioritize class imbalance and evaluates two conditioning mechanisms to adapt pre-trained models to unseen subjects with minimal calibration data. We benchmark three lightweight architectures on a time-modulated Event-Related Potentials (ERP) classification task, providing interpretable evaluation metrics and explainable visualizations of the learned representations. Results demonstrate improved generalization and data-efficient calibration, highlighting the scalability and practicality of subject-adaptive BCIs.
Authors:Sven Jacobs, Jan Haas, Natalie Kiesler
Abstract:
How students utilize immediate tutoring feedback in programming education depends on various factors. Among them are the feedback quality, but also students' engagement, i.e., their perception, interpretation, and use of feedback. However, there is limited research on how students engage with various types of tutoring feedback. For this reason, we developed a learning environment that provides students with Python programming tasks and various types of immediate, AI-generated tutoring feedback. The feedback is displayed within four components. Using a mixed-methods approach (think-aloud study and eye-tracking), we conducted a study with 20 undergraduate students enrolled in an introductory programming course. Our research aims to: (1) identify what students think when they engage with the tutoring feedback components, and (2) explore the relations between the tutoring feedback components, students' visual attention, verbalized thoughts, and their immediate actions as part of the problem-solving process. The analysis of students' thoughts while engaging with 380 feedback components revealed four main themes: students express understanding or disagreement, additional information needed, and students explicitly judge the feedback. Exploring the relations between feedback, students' attention, thoughts, and actions showed a clear relationship. While expressions of understanding were associated with improvements, expressions of disagreement or need for additional information prompted students to collect another feedback component rather than act on the current information. These insights into students' engagement and decision-making processes contribute to an increased understanding of tutoring feedback and how students engage with it. Thereby, this work has implications for tool developers and educators facilitating feedback.
Authors:Felix Glawe, Laura Kremer, Luisa Vervier, Philipp Brauner, Martina Ziefle
Abstract:
Collaborative robots (cobots) are a core technology of Industry 4.0. Industry 4.0 uses cyber-physical systems, IoT and smart automation to improve efficiency and data-driven decision-making. Cobots, as cyber-physical systems, enable the introduction of lightweight automation to smaller companies through their flexibility, low cost and ability to work alongside humans, while keeping humans and their skills in the loop. Industry 5.0, the evolution of Industry 4.0, places the worker at the centre of its principles: The physical and mental well-being of the worker is the main goal of new technology design, not just productivity, efficiency and safety standards. Within this concept, human trust in cobots and human autonomy are important. While trust is essential for effective and smooth interaction, the workers' perception of autonomy is key to intrinsic motivation and overall well-being. As failures are an inevitable part of technological systems, this study aims to answer the question of how system failures affect trust in cobots as well as human autonomy, and how they can be recovered afterwards. Therefore, a VR experiment (n = 39) was set up to investigate the influence of a cobot failure and its severity on human autonomy and trust in the cobot. Furthermore, the influence of transparent communication about the failure and next steps was investigated. The results show that both trust and autonomy suffer after cobot failures, with the severity of the failure having a stronger negative impact on trust, but not on autonomy. Both trust and autonomy can be partially restored by transparent communication.
Authors:Felix Glawe, Tim Schmeckel, Philipp Brauner, Martina Ziefle
Abstract:
Human autonomy and sense of agency are increasingly recognised as critical for user well-being, motivation, and the ethical deployment of robots in human-robot interaction (HRI). Given the rapid development of artificial intelligence, robot capabilities and their potential to function as colleagues and companions are growing. This systematic literature review synthesises 22 empirical studies selected from an initial pool of 728 articles published between 2011 and 2024. Articles were retrieved from major scientific databases and identified based on empirical focus and conceptual relevance, namely, how to preserve and promote human autonomy and sense of agency in HRI. Derived through thematic synthesis, five clusters of potentially influential factors are revealed: robot adaptiveness, communication style, anthropomorphism, presence of a robot and individual differences. Measured through psychometric scales or the intentional binding paradigm, perceptions of autonomy and agency varied across industrial, educational, healthcare, care, and hospitality settings. The review underscores the theoretical differences between both concepts, but their yet entangled use in HRI. Despite increasing interest, the current body of empirical evidence remains limited and fragmented, underscoring the necessity for standardised definitions, more robust operationalisations, and further exploratory and qualitative research. By identifying existing gaps and highlighting emerging trends, this review contributes to the development of human-centered, autonomy-supportive robot design strategies that uphold ethical and psychological principles, ultimately supporting well-being in human-robot interaction.
Authors:Sangwook Lee, Adnan Abbas, Yan Chen, Young-Ho Kim, Sang Won Lee
Abstract:
University research labs often rely on chat-based platforms for communication and project management, where valuable knowledge surfaces but is easily lost in message streams. Documentation can preserve knowledge, but it requires ongoing maintenance and is challenging to navigate. Drawing on formative interviews that revealed organizational memory challenges in labs, we designed CHOIR, an LLM-based chatbot that supports organizational memory through four key functions: document-grounded Q&A, Q&A sharing for follow-up discussion, knowledge extraction from conversations, and AI-assisted document updates. We deployed CHOIR in four research labs for one month (n=21), where the lab members asked 107 questions and lab directors updated documents 38 times in the organizational memory. Our findings reveal a privacy-awareness tension: questions were asked privately, limiting directors' visibility into documentation gaps. Students often avoided contribution due to challenges in generalizing personal experiences into universal documentation. We contribute design implications for privacy-preserving awareness and supporting context-specific knowledge documentation.
Authors:Paris Koloveas, Serafeim Chatzopoulos, Thanasis Vergoulis, Christos Tryfonopoulos
Abstract:
The proliferation of scientific literature presents an increasingly significant challenge for researchers. While Large Language Models (LLMs) offer promise, existing tools often provide verbose summaries that risk replacing, rather than assisting, the reading of the source material. This paper introduces InsightGUIDE, a novel AI-powered tool designed to function as a reading assistant, not a replacement. Our system provides concise, structured insights that act as a "map" to a paper's key elements by embedding an expert's reading methodology directly into its core AI logic. We present the system's architecture, its prompt-driven methodology, and a qualitative case study comparing its output to a general-purpose LLM. The results demonstrate that InsightGUIDE produces more structured and actionable guidance, serving as a more effective tool for the modern researcher.
Authors:Charlotte Lambert, Agam Goyal, Eunice Mok, Eshwar Chandrasekharan
Abstract:
Online communities are constantly growing, with dozens of platforms housing millions of users. Large and small communities alike rely on volunteer moderators to maintain order. Despite their key role, moderators are given a toolbox of punishments and asked to fend off barrages of harmful content. However, prior research shows that positive feedback may proactively encourage higher quality contributions and discourage norm violations. Moreover, moderators themselves have requested support for locating and rewarding content to encourage in their communities. These requests notwithstanding, there is a tangible lack of practical support through tools. Building off moderators' ideas, we build a novel moderation system, the Positive Queue, that augments Reddit's existing moderator interface with features to discover and reward desirable content. Through a user study of moderators, we find that the system has value to vastly different moderation settings. We present design directions and insights for incorporating positive moderation strategies into existing spaces.
Authors:Lawrence Obiuwevwi, Krzysztof J. Rechowicz, Vikas Ashok, Sampath Jayarathna
Abstract:
Diabetes mellitus is a growing global health issue, with Type 1 Diabetes (T1D) requiring constant monitoring to avoid hypoglycemia. Although Continuous Glucose Monitors (CGMs) are effective, their cost and invasiveness limit access, particularly in low-resource settings. This paper proposes a non-invasive method to classify glycemic states using Galvanic Skin Response (GSR), a biosignal commonly captured by wearable sensors. We use the merged OhioT1DM 2018 and 2020 datasets to build a machine learning pipeline that detects hypoglycemia (glucose < 70 mg/dl) and normoglycemia (glucose > 70 mg/dl) with GSR alone. Seven models are trained and evaluated: Random Forest, XGBoost, MLP, CNN, LSTM, Logistic Regression, and K-Nearest Neighbors. Validation sets and 95% confidence intervals are reported to increase reliability and assess robustness. Results show that the LSTM model achieves a perfect hypoglycemia recall (1.00) with an F1-score confidence interval of [0.611-0.745], while XGBoost offers strong performance with a recall of 0.54 even under class imbalance. This approach highlights the potential for affordable, wearable-compatible glucose monitoring tools suitable for settings with limited CGM availability using GSR data. Index Terms: Hypoglycemia Detection, Galvanic Skin Response, Non Invasive Monitoring, Wearables, Machine Learning, Confidence Intervals.
Authors:Yeonsun Yang, Sang Won Lee, Jean Y. Song, Sangdoo Yun, Young-Ho Kim
Abstract:
Non-native English speakers performing English-related tasks at work struggle to sustain ESL learning, despite their motivation. Often, study materials are disconnected from their work context. Although workers rely on LLM assistants to address their immediate needs, these interactions may not directly contribute to their English skills. We present LingoQ, an AI-mediated system that allows workers to practice English using quizzes generated from their LLM queries during work. LingoQ leverages these queries using AI to generate personalized quizzes that workers can review and practice on their smartphones. We conducted a three-week deployment study with 28 ESL workers to evaluate LingoQ. Participants valued the relevance of quizzes that reflect their own context, constantly engaging with the app during the study. This active engagement improved self-efficacy and led to learning gains for beginners and, potentially, for intermediate learners. We discuss opportunities of leveraging users' reliance on LLMs to situate their learning in the user context for improved learning.
Authors:Zihan Ding, Xinyi Wang, Junlong Chen, Per Ola Kristensson, Junxiao Shen
Abstract:
Creators struggle to edit long-form, narrative-rich videos not because of UI complexity, but due to the cognitive demands of searching, storyboarding, and sequencing hours of footage. Existing transcript- or embedding-based methods fall short for creative workflows, as models struggle to track characters, infer motivations, and connect dispersed events. We present a prompt-driven, modular editing system that helps creators restructure multi-hour content through free-form prompts rather than timelines. At its core is a semantic indexing pipeline that builds a global narrative via temporal segmentation, guided memory compression, and cross-granularity fusion, producing interpretable traces of plot, dialogue, emotion, and context. Users receive cinematic edits while optionally refining transparent intermediate outputs. Evaluated on 400+ videos with expert ratings, QA, and preference studies, our system scales prompt-driven editing, preserves narrative coherence, and balances automation with creator control.
Authors:Tao Long, Xuanming Zhang, Sitong Wang, Zhou Yu, Lydia B Chilton
Abstract:
Agentic workflows promise efficiency, but adoption hinges on whether people actually trust systems that act on their behalf. We present DoubleAgents, an agentic planning tool that embeds transparency and control through user intervention, value-reflecting policies, rich state visualizations, and uncertainty flagging for human coordination tasks. A built-in respondent simulation generates realistic scenarios, allowing users to rehearse, refine policies, and calibrate their reliance before live use. We evaluate DoubleAgents in a two-day lab study (n=10), two deployments (n=2), and a technical evaluation. Results show that participants initially hesitated to delegate but grew more reliant as they experienced transparency, control, and adaptive learning during simulated cases. Deployment results demonstrate DoubleAgents' real-world relevance and usefulness, showing that the effort required scaled appropriately with task complexity and contextual data. We contribute trust-by-design patterns and mechanisms for proactive AI -- consistency, controllability, and explainability -- along with simulation as a safe path to build and calibrate trust over time.
Authors:Kumushini Thennakoon, Yasasi Abeysinghe, Bhanuka Mahanama, Vikas Ashok, Sampath Jayarathna
Abstract:
Joint visual attention (JVA) provides informative cues on human behavior during social interactions. The ubiquity of egocentric eye-trackers and large-scale datasets on everyday interactions offer research opportunities in identifying JVA in multi-user environments. We propose a novel approach utilizing spatiotemporal tubes centered on attention rendered by individual gaze and detect JVA using deep-learning-based feature mapping. Our results reveal object-focused collaborative tasks to yield higher JVA (44-46%), whereas independent tasks yield lower (4-5%) attention. Beyond JVA, we analyze attention characteristics using ambient-focal attention coefficient K to understand the qualitative aspects of shared attention. Our analysis reveals $\mathcal{K}$ to converge instances where participants interact with shared objects while diverging when independent. While our study presents seminal findings on joint attention with egocentric commodity eye trackers, it indicates the potential utility of our approach in psychology, human-computer interaction, and social robotics, particularly in understanding attention coordination mechanisms in ecologically valid contexts.
Authors:Mengke Wu, Kexin Quan, Weizi Liu, Mike Yao, Jessie Chin
Abstract:
The growing popularity of AI writing assistants presents exciting opportunities to craft tools that cater to diverse user needs. This study explores how personality shapes preferences for AI writing companions and how personalized designs can enhance human-AI teaming. In an exploratory co-design workshop, we worked with 24 writers with different profiles to surface ideas and map the design space for personality-aligned AI writing companions, focusing on functionality, interaction dynamics, and visual representations. Building on these insights, we developed two contrasting prototypes tailored to distinct writer profiles and engaged 8 participants with them as provocations to spark reflection and feedback. The results revealed strong connections between writer profiles and feature preferences, providing proof-of-concept for personality-driven divergence in AI writing support. This research highlights the critical role of team match in human-AI collaboration and underscores the importance of aligning AI systems with individual cognitive needs to improve user engagement and collaboration productivity.
Authors:Mengke Wu, Weizi Liu, Yanyun Wang, Weiyu Ding, Mike Yao
Abstract:
Smart recommendation algorithms have revolutionized content delivery and improved efficiency across various domains. However, concerns about user agency persist due to their inherent opacity (information asymmetry) and one-way influence (power asymmetry). This study introduces a provotype designed to enhance user agency by providing actionable transparency and control over data management and content delivery. We conducted qualitative interviews with 19 participants to explore their preferences and concerns regarding the features, as well as the provotype's impact on users' understanding and trust toward recommender systems. Findings underscore the importance of integrating transparency with control, and reaffirm users' desire for agency and the ability to actively intervene in personalization. We also discuss insights for encouraging adoption and awareness of such agency-enhancing features. Overall, this study contributes novel approaches and applicable insights, laying the groundwork for designing more user-centered recommender systems that foreground user autonomy and fairness in AI-driven content delivery.
Authors:Sven Jacobs, Natalie Kiesler
Abstract:
Real-time voice interfaces using multimodal Generative AI (GenAI) can potentially address the accessibility needs of novice programmers with disabilities (e.g., related to vision). Yet, little is known about how novices interact with GenAI tools and their feedback quality in the form of audio output. This paper analyzes audio dialogues from nine 9th-grade students using a voice-enabled tutor (powered by OpenAI's Realtime API) in an authentic classroom setting while learning Python. We examined the students' voice prompts and AI's responses (1210 messages) by using qualitative coding. We also gathered students' perceptions via the Partner Modeling Questionnaire. The GenAI Voice Tutor primarily offered feedback on mistakes and next steps, but its correctness was limited (71.4% correct out of 416 feedback outputs). Quality issues were observed, particularly when the AI attempted to utter programming code elements. Students used the GenAI voice tutor primarily for debugging. They perceived it as competent, only somewhat human-like, and flexible. The present study is the first to explore the interaction dynamics of real-time voice GenAI tutors and novice programmers, informing future educational tool design and potentially addressing accessibility needs of diverse learners.
Authors:Agam Goyal, Charlotte Lambert, Eshwar Chandrasekharan
Abstract:
Positive feedback via likes and awards is central to online governance, yet which attributes of users' posts elicit rewards -- and how these vary across authors and communities -- remains unclear. To examine this, we combine quasi-experimental causal inference with predictive modeling on 11M posts from 100 subreddits. We identify linguistic patterns and stylistic attributes causally linked to rewards, controlling for author reputation, timing, and community context. For example, overtly complicated language, tentative style, and toxicity reduce rewards. We use our set of curated features to train models that can detect highly-upvoted posts with high AUC. Our audit of community guidelines highlights a ``policy-practice gap'' -- most rules focus primarily on civility and formatting requirements, with little emphasis on the attributes identified to drive positive feedback. These results inform the design of community guidelines, support interfaces that teach users how to craft desirable contributions, and moderation workflows that emphasize positive reinforcement over purely punitive enforcement.
Authors:Shashidhar Reddy Javaji, Bhavul Gauri, Zining Zhu
Abstract:
Large language models (LLMs) are now used in multi-turn workflows, but we still lack a clear way to measure when iteration helps and when it hurts. We present an evaluation framework for iterative refinement that spans ideation, code, and math. Our protocol runs controlled 12-turn conversations per task, utilizing a variety of prompts ranging from vague ``improve it'' feedback to targeted steering, and logs per-turn outputs. We score outcomes with domain-appropriate checks (unit tests for code; answer-equivalence plus reasoning-soundness for math; originality and feasibility for ideation) and track turn-level behavior with three families of metrics: semantic movement across turns, turn-to-turn change, and output size growth. Across models and tasks, gains are domain-dependent: they arrive early in ideas and code, but in math late turns matter when guided by elaboration. After the first few turns, vague feedback often plateaus or reverses correctness, while targeted prompts reliably shift the intended quality axis (novelty vs. feasibility in ideation; speed vs. readability in code; in math, elaboration outperforms exploration and drives late-turn gains). We also observe consistent domain patterns: ideation moves more in meaning across turns, code tends to grow in size with little semantic change, and math starts fixed but can break that path with late, elaborative iteration. Together, the framework and metrics make iteration measurable and comparable across models, and signal when to steer, stop, or switch strategies.
Authors:Michele Materazzini, Gianluca Morciano, Jose Manuel Alcalde-Llergo, Enrique Yeguas-Bolivar, Giuseppe Calabro, Andrea Zingoni, Juri Taborri
Abstract:
This study explores the use of virtual reality (VR) and artificial intelligence (AI) to predict the presence of dyslexia in Italian and Spanish university students. In particular, the research investigates whether VR-derived data from Silent Reading (SR) tests and self-esteem assessments can differentiate between students that are affected by dyslexia and students that are not, employing machine learning (ML) algorithms. Participants completed VR-based tasks measuring reading performance and self-esteem. A preliminary statistical analysis (t tests and Mann Whitney tests) on these data was performed, to compare the obtained scores between individuals with and without dyslexia, revealing significant differences in completion time for the SR test, but not in accuracy, nor in self esteem. Then, supervised ML models were trained and tested, demonstrating an ability to classify the presence/absence of dyslexia with an accuracy of 87.5 per cent for Italian, 66.6 per cent for Spanish, and 75.0 per cent for the pooled group. These findings suggest that VR and ML can effectively be used as supporting tools for assessing dyslexia, particularly by capturing differences in task completion speed, but language-specific factors may influence classification accuracy.
Authors:Jose Manuel Alcalde-Llergo, Andrea Zingoni, Pilar Aparicio-Martinez, Sara Pinzi, Enrique Yeguas-Bolivar
Abstract:
Dyslexia is a neurodevelopmental disorder estimated to strike approximately 5 to 10 per cent of the population. In particular, phonological dyslexia causes problems in connecting the sounds of words with their written forms. Consequently, affected individuals may encounter issues such as slow reading speed, inaccurate reading, and difficulty decoding unfamiliar words. To address these complexities, the use of compensatory tools and strategies is essential to ensure equitable opportunities for dyslexic students. However, the general underestimation of the issue and lack of awareness regarding the significance of support methodologies pose significant obstacles. One of the ways to enhance consciousness towards a certain issue is by stimulating empathy with whom is affected by it. In light of this, this study introduces a serious game in virtual reality, targeted at educators, students, and, in general, at the non-dyslexic community. The game seeks to enhance understanding of the challenges that individuals with dyslexia experience daily, highlighting the relevance of supportive measures. This approach encourages players to empathize with the struggles of dyslexic individuals and to learn firsthand the importance of supportive methodologies. The final version of the experience was tested by 101 participants and evaluated through a specific collection of questionnaires validated in the literature. The results show that using the proposed virtual reality tool to promote empathy for individuals with phonological dyslexia is highly effective, leading to an average 20 per cent increase in participants' empathy after playing the game.
Authors:Xia Su, Ruiqi Chen, Jingwei Ma, Chu Li, Jon E. Froehlich
Abstract:
Indoor mapping data is crucial for routing, navigation, and building management, yet such data are widely lacking due to the manual labor and expense of data collection, especially for larger indoor spaces. Leveraging recent advancements in commodity drones and photogrammetry, we introduce FlyMeThrough -- a drone-based indoor scanning system that efficiently produces 3D reconstructions of indoor spaces with human-AI collaborative annotations for key indoor points-of-interest (POI) such as entrances, restrooms, stairs, and elevators. We evaluated FlyMeThrough in 12 indoor spaces with varying sizes and functionality. To investigate use cases and solicit feedback from target stakeholders, we also conducted a qualitative user study with five building managers and five occupants. Our findings indicate that FlyMeThrough can efficiently and precisely create indoor 3D maps for strategic space planning, resource management, and navigation.
Authors:Kinga Skiers, Yun Suen Pai, Marina Nakagawa, Kouta Minamizawa, Giulia Barbareschi
Abstract:
Neurodivergent individuals, particularly those with Autism and Attention Deficit Hyperactivity Disorder (ADHD), frequently experience anxiety, panic attacks, meltdowns, and emotional dysregulation due to societal pressures and inadequate accommodations. These challenges are especially pronounced for neurodivergent women and non-binary individuals navigating intersecting barriers of neurological differences and gender expectations. This research investigates virtual reality (VR) as a portable safe space for emotional regulation, addressing challenges of sensory overload and motion sickness while enhancing relaxation capabilities. Our mixed-methods approach included an online survey (N=223) and an ideation workshop (N=32), which provided key design elements for creating effective calming VR environments. Based on these findings, we developed and iteratively tested VR prototypes with neurodivergent women and non-binary participants (N=12), leading to a final version offering enhanced adaptability to individual sensory needs. This final prototype underwent a comprehensive evaluation with 25 neurodivergent participants to assess its effectiveness as a regulatory tool. This research contributes to the development of inclusive, adaptive VR environments that function as personalized "portable silent rooms" offering neurodivergent individuals on-demand access to sensory regulation regardless of physical location.
Authors:Arlen Fan, Fan Lei, Steven R. Corman, Ross Maciejewski
Abstract:
The proliferation of misinformation in journalism, often stemming from flawed reasoning and logical fallacies, poses significant challenges to public understanding and trust in news media. Traditional fact-checking methods, while valuable, are insufficient for detecting the subtle logical inconsistencies that can mislead readers within seemingly factual content. To address this gap, we introduce Skeptik, a hybrid framework that integrates Large Language Models (LLMs) with heuristic approaches to analyze and annotate potential logical fallacies and reasoning errors in online news articles. Operating as a web browser extension, Skeptik automatically highlights sentences that may contain logical fallacies, provides detailed explanations, and offers multi-layered interventions to help readers critically assess the information presented. The system is designed to be extensible, accommodating a wide range of fallacy types and adapting to evolving misinformation tactics. Through comprehensive case studies, quantitative analyses, usability experiments, and expert evaluations, we demonstrate the effectiveness of Skeptik in enhancing readers' critical examination of news content and promoting media literacy. Our contributions include the development of an expandable classification system for logical fallacies, the innovative integration of LLMs for real-time analysis and annotation, and the creation of an interactive user interface that fosters user engagement and close reading. By emphasizing the logical integrity of textual content rather than relying solely on factual accuracy, Skeptik offers a comprehensive solution to combat potential misinformation in journalism. Ultimately, our framework aims to improve critical reading and protect the public from deceptive information online and enhance the overall credibility of news media.
Authors:Zhao Ren, Simon Pistrosch, Buket CoÅkun, Kevin Scheck, Anton Batliner, Björn W. Schuller, Tanja Schultz
Abstract:
The ability to speak is an inherent part of human nature and fundamental to our existence as a social species. Unfortunately, this ability can be restricted in certain situations, such as for individuals who have lost their voice or in environments where speaking aloud is unsuitable. Additionally, some people may prefer not to speak audibly due to privacy concerns. For such cases, silent speech interfaces have been proposed, which focus on processing biosignals corresponding to silently produced speech. These interfaces enable synthesis of audible speech from biosignals that are produced when speaking silently and recognition aka decoding of biosignals into text that corresponds to the silently produced speech. While recognition and synthesis of silent speech has been a prominent focus in many research studies, there is a significant gap in deriving paralinguistic information such as affective states from silent speech. To fill this gap, we propose Silent Paralinguistics, aiming to predict paralinguistic information from silent speech and ultimately integrate it into the reconstructed audible voice for natural communication. This survey provides a comprehensive look at methods, research strategies, and objectives within the emerging field of silent paralinguistics.
Authors:Ryan Ghamandi, Yahya Hmaiti, Mykola Maslych, Ravi Kiran Kattoju, Joseph J. LaViola
Abstract:
We explore natural user interactions using a virtual reality simulation of a robot arm for assembly tasks. Using a Wizard-of-Oz study, participants completed collaborative LEGO and instructive PCB assembly tasks, with the robot responding under experimenter control. We collected voice, hand tracking, and gaze data from users. Statistical analyses revealed that instructive and collaborative scenarios elicit distinct behaviors and adopted strategies, particularly as tasks progress. Users tended to use put-that-there language in spatially ambiguous contexts and more descriptive instructions in spatially clear ones. Our contributions include the identification of natural interaction strategies through analyses of collected data, as well as the supporting dataset, to guide the understanding and design of natural multimodal user interfaces for instructive interaction with systems in virtual reality.
Authors:Azmine Toushik Wasi, Rahatun Nesa Priti, Mahir Absar Khan, Abdur Rahman, Mst Rafia Islam
Abstract:
Deepfakes, AI-generated multimedia content that mimics real media, are becoming increasingly prevalent, posing significant risks to political stability, social trust, and economic well-being, especially in developing societies with limited media literacy and technological infrastructure. This work aims to understand how these technologies are perceived and impact resource-limited communities. We conducted a survey to assess public awareness, perceptions, and experiences with deepfakes, leading to the development of a comprehensive framework for prevention, detection, and mitigation in tech-limited environments. Our findings reveal critical knowledge gaps and a lack of effective detection tools, emphasizing the need for targeted education and accessible verification solutions. This work offers actionable insights to support vulnerable populations and calls for further interdisciplinary efforts to tackle deepfake challenges globally, particularly in the Global South.
Authors:Anton Belichenko, Daria Trinitatova, Aigul Nasibullina, Lev Yakovlev, Dzmitry Tsetserukou
Abstract:
Understanding the neural correlates of sensory imagery is crucial for advancing cognitive neuroscience and developing novel Brain-Computer Interface (BCI) paradigms. This study investigated the influence of imagined temperature sensations (ITS) on neural activity within the sensorimotor cortex. The experimental study involved the evaluation of neural activity using electroencephalography (EEG) during both real thermal stimulation (TS: 40°C Hot, 20°C Cold) applied to the participants' hand, and the mental temperature imagination (ITS) of the corresponding hot and cold sensations. The analysis focused on quantifying the event-related desynchronization (ERD) of the sensorimotor mu-rhythm (8-13 Hz). The experimental results revealed a characteristic mu-ERD localized over central scalp regions (e.g., C3) during both TS and ITS conditions. Although the magnitude of mu-ERD during ITS was slightly lower than during TS, this difference was not statistically significant (p>.05). However, ERD during both ITS and TS was statistically significantly different from the resting baseline (p<.001). These findings demonstrate that imagining temperature sensations engages sensorimotor cortical mechanisms in a manner comparable to actual thermal perception. This insight expands our understanding of the neurophysiological basis of sensory imagery and suggests the potential utility of ITS for non-motor BCI control and neurorehabilitation technologies.
Authors:Jixian Li, Timbwaoga Aime Judicael Ouermi, Mengjiao Han, Chris R. Johnson
Abstract:
Predicting particle trajectories with neural networks (NNs) has substantially enhanced many scientific and engineering domains. However, effectively quantifying and visualizing the inherent uncertainty in predictions remains challenging. Without an understanding of the uncertainty, the reliability of NN models in applications where trustworthiness is paramount is significantly compromised. This paper introduces the uncertainty tube, a novel, computationally efficient visualization method designed to represent this uncertainty in NN-derived particle paths. Our key innovation is the design and implementation of a superelliptical tube that accurately captures and intuitively conveys nonsymmetric uncertainty. By integrating well-established uncertainty quantification techniques, such as Deep Ensembles, Monte Carlo Dropout (MC Dropout), and Stochastic Weight Averaging-Gaussian (SWAG), we demonstrate the practical utility of the uncertainty tube, showcasing its application on both synthetic and simulation datasets.
Authors:Truong Thanh Hung Nguyen, Tran Diem Quynh Nguyen, Hoang Loc Cao, Thi Cam Thanh Tran, Thi Cam Mai Truong, Hung Cao
Abstract:
Business interview preparation demands both solid theoretical grounding and refined soft skills, yet conventional classroom methods rarely deliver the individualized, culturally aware practice employers currently expect. This paper introduces SimInterview, a large language model (LLM)-based simulated multilingual interview training system designed for business professionals entering the AI-transformed labor market. Our system leverages an LLM agent and synthetic AI technologies to create realistic virtual recruiters capable of conducting personalized, real-time conversational interviews. The framework dynamically adapts interview scenarios using retrieval-augmented generation (RAG) to match individual resumes with specific job requirements across multiple languages. Built on LLMs (OpenAI o3, Llama 4 Maverick, Gemma 3), integrated with Whisper speech recognition, GPT-SoVITS voice synthesis, Ditto diffusion-based talking head generation model, and ChromaDB vector databases, our system significantly improves interview readiness across English and Japanese markets. Experiments with university-level candidates show that the system consistently aligns its assessments with job requirements, faithfully preserves resume content, and earns high satisfaction ratings, with the lightweight Gemma 3 model producing the most engaging conversations. Qualitative findings revealed that the standardized Japanese resume format improved document retrieval while diverse English resumes introduced additional variability, and they highlighted how cultural norms shape follow-up questioning strategies. Finally, we also outlined a contestable AI design that can explain, detect bias, and preserve human-in-the-loop to meet emerging regulatory expectations.
Authors:Matt Gottsacker, Yahya Hmaiti, Mykola Maslych, Gerd Bruder, Joseph J. LaViola, Gregory F. Welch
Abstract:
A core component of completing tasks efficiently in computer-supported knowledge work is the ability for users to rapidly switch their focus (and interaction) across different applications using various shortcuts and gestures. This feature set has been explored in research, and several modern consumer extended reality (XR) headsets now support loading multiple applications windows at once. However, many XR applications that are useful for knowledge work involve rich spatial information, which window-based metaphors do not sufficiently represent nor afford appropriate interaction. In modern XR headsets, such immersive applications run as siloed experiences, requiring the user to fully exit one before starting another. We present a vision for achieving an XR-first, user-centric paradigm for efficient context switching in XR to encourage and guide future research and development of XR context- and task-switching interfaces.
Authors:Nishanth Chidambaram, Weichen Liu, Manas Satish Bedmutha, Nadir Weibel, Chen Chen
Abstract:
Using head-mounted Virtual Reality (VR) displays to simulate driving is critical to studying driving behavior and designing driver assistance systems. But existing VR driving simulators are often limited to tracking only eye movements. The bulky outside-in tracking setup and Unreal-based architecture also present significant engineering challenges for interaction researchers and practitioners. We present DriveSimQuest, a VR driving simulator and research platform built on the Meta Quest Pro and Unity, capable of capturing rich behavioral signals such as gaze, facial expressions, hand activities, and full-body gestures in real-time. DriveSimQuest offers a preliminary, easy-to-deploy platform that supports researchers and practitioners in studying drivers' affective states and behaviors, and in designing future context-aware driving assistance systems.
Authors:Abhijit Sinha, Harishankar Kumar, Mohit Joshi, Hemant Kumar Kathania, Shrikanth Narayanan, Sudarsana Reddy Kadiri
Abstract:
Children's speech presents challenges for age and gender classification due to high variability in pitch, articulation, and developmental traits. While self-supervised learning (SSL) models perform well on adult speech tasks, their ability to encode speaker traits in children remains underexplored. This paper presents a detailed layer-wise analysis of four Wav2Vec2 variants using the PFSTAR and CMU Kids datasets. Results show that early layers (1-7) capture speaker-specific cues more effectively than deeper layers, which increasingly focus on linguistic information. Applying PCA further improves classification, reducing redundancy and highlighting the most informative components. The Wav2Vec2-large-lv60 model achieves 97.14% (age) and 98.20% (gender) on CMU Kids; base-100h and large-lv60 models reach 86.05% and 95.00% on PFSTAR. These results reveal how speaker traits are structured across SSL model depth and support more targeted, adaptive strategies for child-aware speech interfaces.
Authors:Yuxiao Wang, Wolin Liang, Yu Lei, Weiying Xue, Nan Zhuang, Qi Liu
Abstract:
Human-Object Interaction (HOI) detection aims to localize human-object pairs and recognize their interactions in images. Although DETR-based methods have recently emerged as the mainstream framework for HOI detection, they still suffer from a key limitation: Randomly initialized queries lack explicit semantics, leading to suboptimal detection performance. To address this challenge, we propose QueryCraft, a novel plug-and-play HOI detection framework that incorporates semantic priors and guided feature learning through transformer-based query initialization. Central to our approach is \textbf{ACTOR} (\textbf{A}ction-aware \textbf{C}ross-modal \textbf{T}ransf\textbf{OR}mer), a cross-modal Transformer encoder that jointly attends to visual regions and textual prompts to extract action-relevant features. Rather than merely aligning modalities, ACTOR leverages language-guided attention to infer interaction semantics and produce semantically meaningful query representations. To further enhance object-level query quality, we introduce a \textbf{P}erceptual \textbf{D}istilled \textbf{Q}uery \textbf{D}ecoder (\textbf{PDQD}), which distills object category awareness from a pre-trained detector to serve as object query initiation. This dual-branch query initialization enables the model to generate more interpretable and effective queries for HOI detection. Extensive experiments on HICO-Det and V-COCO benchmarks demonstrate that our method achieves state-of-the-art performance and strong generalization. Code will be released upon publication.
Authors:Celina Kacperski, Florian Kutzner
Abstract:
Electromobility can contribute to a reduction in greenhouse gas emissions if usage behavior is aligned with the increasing availability of renewable energy. To achieve this, smart navigation systems can be used to inform drivers of optimal charging times and locations. Yet, required flexibility may impart time penalties. We investigate the impact of financial and symbolic incentive schemes to counteract these additional costs. In a laboratory experiment with real-life time costs, we find that monetary and symbolic incentives are both effective in changing behavior towards 'greener' charging choices, while we find no significant statistical difference between them.
Authors:Zeyu Yang, Zheng Wei, Yang Zhang, Xian Xu, Changyang He, Muzhi Zhou, Pan Hui
Abstract:
Livestream shopping platforms often overlook the accessibility needs of the Deaf and Hard of Hearing (DHH) community, leading to barriers such as information inaccessibility and overload. To tackle these challenges, we developed \textit{EchoAid}, a mobile app designed to improve the livestream shopping experience for DHH users. \textit{EchoAid} utilizes advanced speech-to-text conversion, Rapid Serial Visual Presentation (RSVP) technology, and Large Language Models (LLMs) to simplify the complex information flow in live sales environments. We conducted exploratory studies with eight DHH individuals to identify design needs and iteratively developed the \textit{EchoAid} prototype based on feedback from three participants. We then evaluate the performance of this system in a user study workshop involving 38 DHH participants. Our findings demonstrate the successful design and validation process of \textit{EchoAid}, highlighting its potential to enhance product information extraction, leading to reduced cognitive overload and more engaging and customized shopping experiences for DHH users.
Authors:Nello Balossino, Rossana Damiano, Cristina Gena, Alberto Lillo, Anna Maria Marras, Claudio Mattutino, Antonio Pizzo, Alessia Prin, Fabiana Vernero
Abstract:
There are still many museums that present accessibility barriers, particularly regarding perceptual, cultural, and cognitive aspects. This is especially evident in low-density population areas. The aim of the ROBSO-PM project is to improve the accessibility of small museums through the use of social robots and social telepresence robots, focusing on three museums as case studies: the Museum of the Holy Shroud in Turin, a small but globally known institution, and two lesser known mountain museums: the Museum of the Champlas du Col Carnival and the Pragelato Museum of Alpine Peoples' Costumes and Traditions. The project explores two main applications for robots: as guides supporting inclusive visits for foreign or disabled visitors, and as telepresence tools allowing people with limited mobility to access museums remotely. From a research perspective, key topics include storytelling, robot personality, empathy, personalization, and, in the case of telepresence, collaboration between the robot and the person, with clearly defined roles and autonomy.
Authors:Kosmas Pinitas, Konstantinos Makantasis, Georgios N. Yannakakis
Abstract:
Affective Computing (AC) has made significant progress with the advent of deep learning, yet a persistent challenge remains: the reliable transfer of affective models from controlled laboratory settings (in-vitro) to uncontrolled real-world environments (in-vivo). To address this challenge we introduce the Privileged Contrastive Pretraining (PriCon) framework according to which models are first pretrained via supervised contrastive learning (SCL) and then act as teacher models within a Learning Using Privileged Information (LUPI) framework. PriCon both leverages privileged information during training and enhances the robustness of derived affect models via SCL. Experiments conducted on two benchmark affective corpora, RECOLA and AGAIN, demonstrate that models trained using PriCon consistently outperform LUPI and end to end models. Remarkably, in many cases, PriCon models achieve performance comparable to models trained with access to all modalities during both training and testing. The findings underscore the potential of PriCon as a paradigm towards further bridging the gap between in-vitro and in-vivo affective modelling, offering a scalable and practical solution for real-world applications.
Authors:Yaxin Hu, Arissa J. Sato, Jingxin Du, Chenming Ye, Anjun Zhu, Pragathi Praveena, Bilge Mutlu
Abstract:
Robotic telepresence enables users to navigate and experience remote environments. However, effective navigation and situational awareness depend on users' prior knowledge of the environment, limiting the usefulness of these systems for exploring unfamiliar places. We explore how integrating location-aware LLM-based narrative capabilities into a mobile robot can support remote exploration. We developed a prototype system, called NarraGuide, that provides narrative guidance for users to explore and learn about a remote place through a dialogue-based interface. We deployed our prototype in a geology museum, where remote participants (n=20) used the robot to tour the museum. Our findings reveal how users perceived the robot's role, engaged in dialogue in the tour, and expressed preferences for bystander encountering. Our work demonstrates the potential of LLM-enabled robotic capabilities to deliver location-aware narrative guidance and enrich the experience of exploring remote environments.
Authors:Zixin Chen, Jiachen Wang, Yumeng Li, Haobo Li, Chuhan Shi, Rong Zhang, Huamin Qu
Abstract:
Grading project reports are increasingly significant in today's educational landscape, where they serve as key assessments of students' comprehensive problem-solving abilities. However, it remains challenging due to the multifaceted evaluation criteria involved, such as creativity and peer-comparative achievement. Meanwhile, instructors often struggle to maintain fairness throughout the time-consuming grading process. Recent advances in AI, particularly large language models, have demonstrated potential for automating simpler grading tasks, such as assessing quizzes or basic writing quality. However, these tools often fall short when it comes to complex metrics, like design innovation and the practical application of knowledge, that require an instructor's educational insights into the class situation. To address this challenge, we conducted a formative study with six instructors and developed CoGrader, which introduces a novel grading workflow combining human-LLM collaborative metrics design, benchmarking, and AI-assisted feedback. CoGrader was found effective in improving grading efficiency and consistency while providing reliable peer-comparative feedback to students. We also discuss design insights and ethical considerations for the development of human-AI collaborative grading systems.
Authors:Alexandros Gazis, Eleftheria Katsiri
Abstract:
E-polis is a serious digital game designed to gamify sociological surveys studying young people's political opinions. In this platform game, players navigate a digital world, encountering quests posing sociological questions. Players' answers shape the city-game world, altering building structures based on their choices. E-polis is a serious game, not a government simulation, aiming to understand players' behaviors and opinions thus we do not train the players but rather understand them and help them visualize their choices in shaping a city's future. Also, it is noticed that no correct or incorrect answers apply. Moreover, our game utilizes a novel middleware architecture for development, diverging from typical asset prefab scene and script segregation. This article presents the data layer of our game's middleware, specifically focusing on data analysis based on respondents' gameplay answers. E-polis represents an innovative approach to gamifying sociological research, providing a unique platform for gathering and analyzing data on political opinions among youth and contributing to the broader field of serious games.
Authors:Chang Chen, Sicheng Song, Shuchang Xu, Zhicheng Li, Huamin Qu, Yanna Lin
Abstract:
English speech rhythm, the temporal patterns of stressed syllables, is essential for English as a second language (ESL) learners to produce natural-sounding and comprehensible speech. Rhythm training is generally based on imitation of native speech. However, it relies heavily on external instructor feedback, preventing ESL learners from independent practice. To address this gap, we present RhythmTA, an interactive system for ESL learners to practice speech rhythm independently via dubbing, an imitation-based approach. The system automatically extracts rhythm from any English speech and introduces novel visual designs to support three stages of dubbing practice: (1) Synchronized listening with visual aids to enhance perception, (2) Guided repeating by visual cues for self-adjustment, and (3) Comparative reflection from a parallel view for self-monitoring. Our design is informed by a formative study with nine spoken English instructors, which identified current practices and challenges. A user study with twelve ESL learners demonstrates that RhythmTA effectively enhances learners' rhythm perception and shows significant potential for improving rhythm production.
Authors:Liang Zhang, Xiaoming Zhai, Jionghao Lin, Jionghao Lin, Jennifer Kleiman, Diego Zapata-Rivera, Carol Forsyth, Yang Jiang, Xiangen Hu, Arthur C. Graesser
Abstract:
Large Language Model (LLM) agents are increasingly utilized in AI-aided education to support tutoring and learning. Effective communication strategies among LLM agents improve collaborative problem-solving efficiency and facilitate cost-effective adoption in education. However, little research has systematically evaluated the impact of different communication strategies on agents' problem-solving. Our study examines four communication modes, \textit{teacher-student interaction}, \textit{peer-to-peer collaboration}, \textit{reciprocal peer teaching}, and \textit{critical debate}, in a dual-agent, chat-based mathematical problem-solving environment using the OpenAI GPT-4o model. Evaluated on the MATH dataset, our results show that dual-agent setups outperform single agents, with \textit{peer-to-peer collaboration} achieving the highest accuracy. Dialogue acts like statements, acknowledgment, and hints play a key role in collaborative problem-solving. While multi-agent frameworks enhance computational tasks, effective communication strategies are essential for tackling complex problems in AI education.
Authors:Liwenhan Xie, Yanna Lin, Can Liu, Huamin Qu, Xinhuan Shu
Abstract:
Creating aesthetically pleasing data visualizations remains challenging for users without design expertise or familiarity with visualization tools. To address this gap, we present DataWink, a system that enables users to create custom visualizations by adapting high-quality examples. Our approach combines large multimodal models (LMMs) to extract data encoding from existing SVG-based visualization examples, featuring an intermediate representation of visualizations that bridges primitive SVG and visualization programs. Users may express adaptation goals to a conversational agent and control the visual appearance through widgets generated on demand. With an interactive interface, users can modify both data mappings and visual design elements while maintaining the original visualization's aesthetic quality. To evaluate DataWink, we conduct a user study (N=12) with replication and free-form exploration tasks. As a result, DataWink is recognized for its learnability and effectiveness in personalized authoring tasks. Our results demonstrate the potential of example-driven approaches for democratizing visualization creation.
Authors:Zihe Ran, Xiyu Li, Qing Xiao, Yanyun Wang, Franklin Mingzhe Li, Zhicong Lu
Abstract:
Mobile games are becoming a vital medium for social interaction, offering a platform that transcends geographical boundaries. An increasing number of visually impaired individuals are engaging in mobile gaming to connect, collaborate, compete, and build friendships. In China, visually impaired communities face significant social challenges in offline settings, making mobile games a crucial avenue for socialization. However, the design of mobile games and their mapping to real-world environments significantly shape their social gaming experiences. This study explores how visually impaired players in China navigate socialization and integrate into gaming communities. Through interviews with 30 visually impaired players, we found that while mobile games fulfill many of their social needs, technological barriers and insufficient accessibility features, and internal community divisions present significant challenges to their participation. This research sheds light on their social experiences and offers insights for designing more inclusive and accessible mobile games.
Authors:Shayla Sharmin, Roghayeh Leila Barmaki
Abstract:
The estimation of cognitive effort could potentially help educators to modify material to enhance learning effectiveness and student engagement. Where cognitive load refers how much work the brain is doing while someone is learning or doing a task cognitive effort consider both load and behavioral performance. Cognitive effort can be captured by measuring oxygen flow and behavioral performance during a task. This study infers cognitive effort metrics using machine learning models based on oxygenated hemoglobin collected by using functional near-infrared spectroscopy from the prefrontal cortex during an educational gameplay. In our study, sixteen participants responded to sixteen questions in an in-house Unity-based educational game. The quiz was divided into two sessions, each session consisting of two task segments. We extracted temporal statistical and functional connectivity features from collected oxygenated hemoglobin and analyzed their correlation with quiz performance. We trained multiple machine learning models to predict quiz performance from oxygenated hemoglobin features and achieved accuracies ranging from 58\% to 67\% accuracy. These predictions were used to calculate cognitive effort via relative neural involvement and efficiency, which consider both brain activation and behavioral performance. Although quiz score predictions achieved moderate accuracy, the derived relative neural efficiency and involvement values remained robust. Since both metrics are based on the relative positions of standardized brain activation and performance scores, even small misclassifications in predicted scores preserved the overall cognitive effort trends observed during gameplay.
Authors:Rui Sheng, Xingbo Wang, Jiachen Wang, Xiaofu Jin, Zhonghua Sheng, Zhenxing Xu, Suraj Rajendran, Huamin Qu, Fei Wang
Abstract:
Eligibility criteria play a critical role in clinical trials by determining the target patient population, which significantly influences the outcomes of medical interventions. However, current approaches for designing eligibility criteria have limitations to support interactive exploration of the large space of eligibility criteria. They also ignore incorporating detailed characteristics from the original electronic health record (EHR) data for criteria refinement. To address these limitations, we proposed TrialCompass, a visual analytics system integrating a novel workflow, which can empower clinicians to iteratively explore the vast space of eligibility criteria through knowledge-driven and outcome-driven approaches. TrialCompass supports history-tracking to help clinicians trace the evolution of their adjustments and decisions when exploring various forms of data (i.e., eligibility criteria, outcome metrics, and detailed characteristics of original EHR data) through these two approaches. This feature can help clinicians comprehend the impact of eligibility criteria on outcome metrics and patient characteristics, which facilitates systematic refinement of eligibility criteria. Using a real-world dataset, we demonstrated the effectiveness of TrialCompass in providing insights into designing eligibility criteria for septic shock and sepsis-associated acute kidney injury. We also discussed the research prospects of applying visual analytics to clinical trials.
Authors:Daria Trinitatova, Dzmitry Tsetserukou
Abstract:
The applications of fingertip haptic devices have spread to various fields from revolutionizing virtual reality and medical training simulations to facilitating remote robotic operations, proposing great potential for enhancing user experiences, improving training outcomes, and new forms of interaction. In this work, we present FiDTouch, a 3D wearable haptic device that delivers cutaneous stimuli to the finger pad, such as contact, pressure, encounter, skin stretch, and vibrotactile feedback. The application of a tiny inverted Delta robot in the mechanism design allows providing accurate contact and fast changing dynamic stimuli to the finger pad surface. The performance of the developed display was evaluated in a two-stage user study of the perception of static spatial contact stimuli and skin stretch stimuli generated on the finger pad. The proposed display, by providing users with precise touch and force stimuli, can enhance user immersion and efficiency in the fields of human-computer and human-robot interactions.
Authors:Ferran GebellÃ, AnaÃs Garrell, Jan-Gerrit Habekost, Séverin Lemaignan, Stefan Wermter, Raquel Ros
Abstract:
In the field of Human-Robot Interaction (HRI), a fundamental challenge is to facilitate human understanding of robots. The emerging domain of eXplainable HRI (XHRI) investigates methods to generate explanations and evaluate their impact on human-robot interactions. Previous works have highlighted the need to personalise the level of detail of these explanations to enhance usability and comprehension. Our paper presents a framework designed to update and retrieve user knowledge-memory models, allowing for adapting the explanations' level of detail while referencing previously acquired concepts. Three architectures based on our proposed framework that use Large Language Models (LLMs) are evaluated in two distinct scenarios: a hospital patrolling robot and a kitchen assistant robot. Experimental results demonstrate that a two-stage architecture, which first generates an explanation and then personalises it, is the framework architecture that effectively reduces the level of detail only when there is related user knowledge.
Authors:Luke Guerdan, Devansh Saxena, Stevie Chancellor, Zhiwei Steven Wu, Kenneth Holstein
Abstract:
Data scientists often formulate predictive modeling tasks involving fuzzy, hard-to-define concepts, such as the "authenticity" of student writing or the "healthcare need" of a patient. Yet the process by which data scientists translate fuzzy concepts into a concrete, proxy target variable remains poorly understood. We interview fifteen data scientists in education (N=8) and healthcare (N=7) to understand how they construct target variables for predictive modeling tasks. Our findings suggest that data scientists construct target variables through a bricolage process, in which they use creative and pragmatic approaches to make do with the limited data at hand. Data scientists attempt to satisfy five major criteria for a target variable through bricolage: validity, simplicity, predictability, portability, and resource requirements. To achieve this, data scientists adaptively apply problem (re)formulation strategies, such as swapping out one candidate target variable for another when the first fails to meet certain criteria (e.g., predictability), or composing multiple outcomes into a single target variable to capture a more holistic set of modeling objectives. Based on our findings, we present opportunities for future HCI, CSCW, and ML research to better support the art and science of target variable construction.
Authors:Tinghe Hong, Shenlin Cai, Boyang Li, Kai Huang
Abstract:
Ophthalmic surgical robots offer superior stability and precision by reducing the natural hand tremors of human surgeons, enabling delicate operations in confined surgical spaces. Despite the advancements in developing vision- and force-based control methods for surgical robots, preoperative navigation remains heavily reliant on manual operation, limiting the consistency and increasing the uncertainty. Existing eye gaze estimation techniques in the surgery, whether traditional or deep learning-based, face challenges including dependence on additional sensors, occlusion issues in surgical environments, and the requirement for facial detection. To address these limitations, this study proposes an innovative eye localization and tracking method that combines machine learning with traditional algorithms, eliminating the requirements of landmarks and maintaining stable iris detection and gaze estimation under varying lighting and shadow conditions. Extensive real-world experiment results show that our proposed method has an average estimation error of 0.58 degrees for eye orientation estimation and 2.08-degree average control error for the robotic arm's movement based on the calculated orientation.
Authors:Xingyu Xiao, Jiejuan Tong, Peng Chen, Jun Sun, Zhe Sui, Jingang Liang, Hongru Zhao, Jun Zhao, Haitao Wang
Abstract:
Human reliability remains a critical concern in safety-critical domains such as nuclear power, where operational failures are often linked to human error. While conventional human reliability analysis (HRA) methods have been widely adopted, they rely heavily on expert judgment for identifying human failure events (HFEs) and assigning performance influencing factors (PIFs). This reliance introduces challenges related to reproducibility, subjectivity, and limited integration of interface-level data. In particular, current approaches lack the capacity to rigorously assess how human-machine interface design contributes to operator performance variability and error susceptibility. To address these limitations, this study proposes a framework for risk-informed human failure event identification and interface-induced risk assessment driven by AutoGraph (InSight-R). By linking empirical behavioral data to the interface-embedded knowledge graph (IE-KG) constructed by the automated graph-based execution framework (AutoGraph), the InSight-R framework enables automated HFE identification based on both error-prone and time-deviated operational paths. Furthermore, we discuss the relationship between designer-user conflicts and human error. The results demonstrate that InSight-R not only enhances the objectivity and interpretability of HFE identification but also provides a scalable pathway toward dynamic, real-time human reliability assessment in digitalized control environments. This framework offers actionable insights for interface design optimization and contributes to the advancement of mechanism-driven HRA methodologies.
Authors:Jianxiao Jiang, Yu Zhang
Abstract:
In the age of AI-powered educational (AIED) innovation, evaluating the developmental consequences of novel designs before they are exposed to students has become both essential and challenging. Since such interventions may carry irreversible effects, it is critical to anticipate not only potential benefits but also possible harms. This study proposes a student development agent framework based on large language models (LLMs), designed to simulate how students with diverse characteristics may evolve under different educational settings without administering them to real students. By validating the approach through a case study on a multi-agent learning environment (MAIC), we demonstrate that the agent's predictions align with real student outcomes in non-cognitive developments. The results suggest that LLM-based simulations hold promise for evaluating AIED innovations efficiently and ethically. Future directions include enhancing profile structures, incorporating fine-tuned or small task-specific models, validating effects of empirical findings, interpreting simulated data and optimizing evaluation methods.
Authors:Ruixuan Sun, Junyuan Wang, Sanjali Roy, Joseph A. Konstan
Abstract:
Natural language-based user profiles in recommender systems have been explored for their interpretability and potential to help users scrutinize and refine their interests, thereby improving recommendation quality. Building on this foundation, we introduce a human-AI collaborative profile for a movie recommender system that presents editable personalized interest summaries of a user's movie history. Unlike static profiles, this design invites users to directly inspect, modify, and reflect on the system's inferences. In an eight-week online field deployment with 1775 active movie recommender users, we find persistent gaps between user-perceived and system-inferred interests, show how the profile encourages engagement and reflection, and identify design directions for leveraging imperfect AI-powered user profiles to stimulate more user intervention and build more transparent and trustworthy recommender experiences.
Authors:Joshua Holstein, Gerhard Satzger
Abstract:
Artificial intelligence has become integral to organizational decision-making and while research has explored many facets of this human-AI collaboration, the focus has mainly been on designing the AI agent(s) and the way the collaboration is set up - generally assuming a human decision-maker to be "fixed". However, it has largely been neglected that decision-makers' mental models evolve through their continuous interaction with AI systems. This paper addresses this gap by conceptualizing how the design of human-AI collaboration influences the development of three complementary and interdependent mental models necessary for this collaboration. We develop an integrated socio-technical framework that identifies the mechanisms driving the mental model evolution: data contextualization, reasoning transparency, and performance feedback. Our work advances human-AI collaboration literature through three key contributions: introducing three distinct mental models (domain, information processing, complementarity-awareness); recognizing the dynamic nature of mental models; and establishing mechanisms that guide the purposeful design of effective human-AI collaboration.
Authors:Fabio Morreale, Wiebke Hutiri, Joan Serrà, Alice Xiang, Yuki Mitsufuji
Abstract:
The rise of AI-generated music is diluting royalty pools and revealing structural flaws in existing remuneration frameworks, challenging the well-established artist compensation systems in the music industry. Existing compensation solutions, such as piecemeal licensing agreements, lack scalability and technical rigour, while current data attribution mechanisms provide only uncertain estimates and are rarely implemented in practice. This paper introduces a framework for a generative music infrastructure centred on direct attribution, transparent royalty distribution, and granular control for artists and rights' holders. We distinguish ontologically between the training set and the inference set, which allows us to propose two complementary forms of attribution: training-time attribution and inference-time attribution. We here favour inference-time attribution, as it enables direct, verifiable compensation whenever an artist's catalogue is used to condition a generated output. Besides, users benefit from the ability to condition generations on specific songs and receive transparent information about attribution and permitted usage. Our approach offers an ethical and practical solution to the pressing need for robust compensation mechanisms in the era of AI-generated music, ensuring that provenance and fairness are embedded at the core of generative systems.
Authors:Weiying Liu, Yanran Yuan, Zhiqiang Sheng, Dandan Lian, Sheng Li, Yufan Zhang, Yulong Bian, Juan Liu
Abstract:
Autism Spectrum Disorder (ASD) is marked by action imitation deficits stemming from visuomotor integration impairments, posing challenges to imitation-based learning, such as dance movement therapy in mixed reality (MR-DMT). Previous gaze-guiding interventions in ASD have mainly focused on optimizing gaze in isolation, neglecting the crucial "gaze-performance link". This study investigates enhancing this link in MR-DMT for children with ASD. Initially, we experimentally confirmed the weak link: longer gaze durations didn't translate to better performance. Then, we proposed and validated a novel dual-level visual guidance system that operates on both perceptual and transformational levels: not only directing attention to task-relevant areas but also explicitly scaffolding the translation from gaze perception to performance execution. Our results demonstrate its effectiveness in boosting the gaze-performance link, laying key foundations for more precisely tailored and effective MR-DMT interventions for ASD.
Authors:Timbwaoga A. J. Ouermi, Eric Li, Kenneth Moreland, Dave Pugmire, Chris R. Johnson, Tushar M. Athawale
Abstract:
This work focuses on visualizing uncertainty of local divergence of two-dimensional vector fields. Divergence is one of the fundamental attributes of fluid flows, as it can help domain scientists analyze potential positions of sources (positive divergence) and sinks (negative divergence) in the flow. However, uncertainty inherent in vector field data can lead to erroneous divergence computations, adversely impacting downstream analysis. While Monte Carlo (MC) sampling is a classical approach for estimating divergence uncertainty, it suffers from slow convergence and poor scalability with increasing data size and sample counts. Thus, we present a two-fold contribution that tackles the challenges of slow convergence and limited scalability of the MC approach. (1) We derive a closed-form approach for highly efficient and accurate uncertainty visualization of local divergence, assuming independently Gaussian-distributed vector uncertainties. (2) We further integrate our approach into Viskores, a platform-portable parallel library, to accelerate uncertainty visualization. In our results, we demonstrate significantly enhanced efficiency and accuracy of our serial analytical (speed-up up to 1946X) and parallel Viskores (speed-up up to 19698X) algorithms over the classical serial MC approach. We also demonstrate qualitative improvements of our probabilistic divergence visualizations over traditional mean-field visualization, which disregards uncertainty. We validate the accuracy and efficiency of our methods on wind forecast and ocean simulation datasets.
Authors:Leopold Müller, Joshua Holstein, Sarah Bause, Gerhard Satzger, Niklas Kühl
Abstract:
Organizations increasingly adopt Retrieval-Augmented Generation (RAG) to enhance Large Language Models with enterprise-specific knowledge. However, current data quality (DQ) frameworks have been primarily developed for static datasets, and only inadequately address the dynamic, multi-stage nature of RAG systems. This study aims to develop DQ dimensions for this new type of AI-based systems. We conduct 16 semi-structured interviews with practitioners of leading IT service companies. Through a qualitative content analysis, we inductively derive 15 distinct DQ dimensions across the four processing stages of RAG systems: data extraction, data transformation, prompt & search, and generation. Our findings reveal that (1) new dimensions have to be added to traditional DQ frameworks to also cover RAG contexts; (2) these new dimensions are concentrated in early RAG steps, suggesting the need for front-loaded quality management strategies, and (3) DQ issues transform and propagate through the RAG pipeline, necessitating a dynamic, step-aware approach to quality management.
Authors:Akash Kumar Panda, Olaoluwa Adigun, Bart Kosko
Abstract:
A large language model (LLM) can map a feedback causal fuzzy cognitive map (FCM) into text and then reconstruct the FCM from the text. This explainable AI system approximates an identity map from the FCM to itself and resembles the operation of an autoencoder (AE). Both the encoder and the decoder explain their decisions in contrast to black-box AEs. Humans can read and interpret the encoded text in contrast to the hidden variables and synaptic webs in AEs. The LLM agent approximates the identity map through a sequence of system instructions that does not compare the output to the input. The reconstruction is lossy because it removes weak causal edges or rules while it preserves strong causal edges. The encoder preserves the strong causal edges even when it trades off some details about the FCM to make the text sound more natural.
Authors:Divyanshu Kumar, Ishita Gupta, Nitin Aravind Birur, Tanay Baswa, Sahil Agarwal, Prashanth Harshangi
Abstract:
Partisan bias in LLMs has been evaluated to assess political leanings, typically through a broad lens and largely in Western contexts. We move beyond identifying general leanings to examine harmful, adversarial representational associations around political leaders and parties. To do so, we create datasets \textit{NeutQA-440} (non-adversarial prompts) and \textit{AdverQA-440} (adversarial prompts), which probe models for comparative plausibility judgments across the USA and India. Results show high susceptibility to biased partisan associations and pronounced asymmetries (e.g., substantially more favorable associations for U.S. Democrats than Republicans) alongside mixed-polarity concentration around India's BJP, highlighting systemic risks and motivating standardized, cross-cultural evaluation.
Authors:Yuxuan Li, Sauvik Das, Hirokazu Shirado
Abstract:
There is growing interest in using Large Language Models as agents (LLM agents) for social simulations to inform policy, yet real-world adoption remains limited. This paper addresses the question: How can LLM agent simulations be made genuinely useful for policy? We report on a year-long iterative design engagement with a university emergency preparedness team. Across multiple design iterations, we iteratively developed a system of 13,000 LLM agents that simulate crowd movement and communication during a large-scale gathering under various emergency scenarios. These simulations informed actual policy implementation, shaping volunteer training, evacuation protocols, and infrastructure planning. Analyzing this process, we identify three design implications: start with verifiable scenarios and build trust gradually, use preliminary simulations to elicit tacit knowledge, and treat simulation and policy development as evolving together. These implications highlight actionable pathways to making LLM agent simulations that are genuinely useful for policy.
Authors:Faraz Faruqi, Amira Abdel-Rahman, Leandra Tejedor, Martin Nisser, Jiaji Li, Vrushank Phadnis, Varun Jampani, Neil Gershenfeld, Megan Hofmann, Stefanie Mueller
Abstract:
Recent developments in Generative AI enable creators to stylize 3D models based on text prompts. These methods change the 3D model geometry, which can compromise the model's structural integrity once fabricated. We present MechStyle, a system that enables creators to stylize 3D printable models while preserving their structural integrity. MechStyle accomplishes this by augmenting the Generative AI-based stylization process with feedback from a Finite Element Analysis (FEA) simulation. As the stylization process modifies the geometry to approximate the desired style, feedback from the FEA simulation reduces modifications to regions with increased stress. We evaluate the effectiveness of FEA simulation feedback in the augmented stylization process by comparing three stylization control strategies. We also investigate the time efficiency of our approach by comparing three adaptive scheduling strategies. Finally, we demonstrate MechStyle's user interface that allows users to generate stylized and structurally viable 3D models and provide five example applications.
Authors:Nabila Amadou, Kazi Injamamul Haque, Zerrin Yumak
Abstract:
3D Virtual Human technology is growing with several potential applications in health, education, business and telecommunications. Investigating the perception of these virtual humans can help guide to develop better and more effective applications. Recent developments show that the appearance of the virtual humans reached to a very realistic level. However, there is not yet adequate analysis on the perception of appearance and animation realism for emotionally expressive virtual humans. In this paper, we designed a user experiment and analyzed the effect of a realistic virtual human's appearance realism and animation realism in varying emotion conditions. We found that higher appearance realism and higher animation realism leads to higher social presence and higher attractiveness ratings. We also found significant effects of animation realism on perceived realism and emotion intensity levels. Our study sheds light into how appearance and animation realism effects the perception of highly realistic virtual humans in emotionally expressive scenarios and points out to future directions.
Authors:Siyi Liu, Kazi Injamamul Haque, Zerrin Yumak
Abstract:
Creating human digital doubles is becoming easier and much more accessible to everyone using consumer grade devices. In this work, we investigate how avatar style (realistic vs cartoon) and avatar familiarity (self, acquaintance, unknown person) affect self/other-identification, perceived realism, affinity and social presence with a controlled offline experiment. We created two styles of avatars (realistic-looking MetaHumans and cartoon-looking ReadyPlayerMe avatars) and facial animations stimuli for them using performance capture. Questionnaire responses demonstrate that higher appearance realism leads to a higher level of identification, perceived realism and social presence. However, avatars with familiar faces, especially those with high appearance realism, lead to a lower level of identification, perceived realism, and affinity. Although participants identified their digital doubles as their own, they consistently did not like their avatars, especially of realistic appearance. But they were less critical and more forgiving about their acquaintance's or an unknown person's digital double.
Authors:Jungeun Lee, Kyungah Lee, Inseok Hwang, SoHyun Park, Young-Ho Kim
Abstract:
Social narratives are known to help autistic children understand and navigate social situations through stories. To ensure effectiveness, however, the materials need to be customized to reflect each child's unique behavioral context, requiring considerable time and effort for parents to practice at home. We present AutiHero, a generative AI-based social narrative system for behavioral guidance, which supports parents to create personalized stories for their autistic children and read them together. AutiHero generates text and visual illustrations that reflect their children's interests, target behaviors, and everyday contexts. In a two-week deployment study with 16 autistic child-parent dyads, parents created 218 stories and read an average of 4.25 stories per day, demonstrating a high level of engagement. AutiHero also provided an effective, low-demanding means to guide children's social behaviors, encouraging positive change. We discuss the implications of generative AI-infused tools to empower parents in guiding their children's behaviors, fostering their social learning.
Authors:Migyeong Yang, Kyungah Lee, Jinyoung Han, SoHyun Park, Young-Ho Kim
Abstract:
Journaling can potentially serve as an effective method for autistic adolescents to improve narrative skills. However, its text-centric nature and high executive functioning demands present barriers to practice. We present Autiverse, an AI-guided multimodal journaling app for tablets that scaffolds storytelling through conversational prompts and visual supports. Autiverse elicits key details through a stepwise dialogue with peer-like, customizable AI and composes them into an editable four-panel comic strip. Through a two-week deployment study with 10 autistic adolescent-parent dyads, we examine how Autiverse supports autistic adolescents to organize their daily experience and emotion. Autiverse helped them construct coherent narratives, while enabling parents to learn additional details of their child's events and emotions. The customized AI peer created a comfortable space for sharing, fostering enjoyment and a strong sense of agency. We discuss the implications of designing technologies that complement autistic adolescents' strengths while ensuring their autonomy and safety in sharing experiences.
Authors:Yifang Wang, Yifan Qian, Xiaoyu Qi, Yian Yin, Shengqi Dang, Ziqing Qian, Benjamin F. Jones, Nan Cao, Dashun Wang
Abstract:
Understanding the broad impact of science and science funding is critical to ensuring that science investments and policies align with societal needs. Existing research links science funding to the output of scientific publications but largely leaves out the downstream uses of science and the myriad ways in which investing in science may impact human society. As funders seek to allocate scarce funding resources across a complex research landscape, there is an urgent need for informative and transparent tools that allow for comprehensive assessments and visualization of the impact of funding. Here we present Funding the Frontier (FtF), a visual analysis system for researchers, funders, policymakers, university leaders, and the broad public to analyze multidimensional impacts of funding and make informed decisions regarding research investments and opportunities. The system is built on a massive data collection that connects 7M research grants to 140M scientific publications, 160M patents, 10.9M policy documents, 800K clinical trials, and 5.8M newsfeeds, with 1.8B citation linkages among these entities, systematically linking science funding to its downstream impacts. As such, Funding the Frontier is distinguished by its multifaceted impact analysis framework. The system incorporates diverse impact metrics and predictive models that forecast future investment opportunities into an array of coordinated views, allowing for easy exploration of funding and its outcomes. We evaluate the effectiveness and usability of the system using case studies and expert interviews. Feedback suggests that our system not only fulfills the primary analysis needs of its target users, but the rich datasets of the complex science ecosystem and the proposed analysis framework also open new avenues for both visualization and the science of science research.
Authors:Leon Hannig, Annika Bush, Meltem Aksoy, Tim Trappen, Steffen Becker, Greta Ontrup
Abstract:
As the use of LLM chatbots by students and researchers becomes more prevalent, universities are pressed to develop AI strategies. One strategy that many universities pursue is to customize pre-trained LLM as-a-service (LLMaaS). While most studies on LLMaaS chatbots prioritize technical adaptations, we focus on psychological effects of user-salient customizations, such as interface changes. We assume that such customizations influence users' perception of the system and are therefore important in guiding safe and appropriate use. In a field study, we examine how students and employees (N = 526) at a German university perceive and use their institution's customized LLMaaS chatbot compared to ChatGPT. Participants using both systems (n = 116) reported greater trust, higher perceived privacy and less experienced hallucinations with their university's customized LLMaaS chatbot in contrast to ChatGPT. We discuss theoretical implications for research on calibrated trust, and offer guidance on the design and deployment of LLMaaS chatbots.
Authors:Shyama Sastha Krishnamoorthy Srinivasan, Mohan Kumar, Pushpendra Singh
Abstract:
The shift away from multigenerational families to nuclear families in India has created a growing need to support older adults living independently. While technology can help address this gap, older adults' limited exposure to newer technology restricts the adoption of such solutions. However, they remain comfortable with long-standing technologies like television (TV). This study explores their daily technology usage and challenges, aiming to determine whether TV can be leveraged to improve their quality of life. We examined how TV systems could be enhanced to assist older adults with tasks such as staying connected, receiving health alerts, and ensuring security. Using a participatory design approach, we developed video probes using the prototype of the TV-based application and interviewed 27 older adults to assess its acceptance and usability. Our findings demonstrate older adults' strong interest in a TV-based solution and a preference for familiar technology to support security, independence, and wellbeing.
Authors:Kuangshi Ai, Haichao Miao, Zhimin Li, Chaoli Wang, Shusen Liu
Abstract:
Recent advances in multi-modal large language models (MLLMs) have enabled increasingly sophisticated autonomous visualization agents capable of translating user intentions into data visualizations. However, measuring progress and comparing different agents remains challenging, particularly in scientific visualization (SciVis), due to the absence of comprehensive, large-scale benchmarks for evaluating real-world capabilities. This position paper examines the various types of evaluation required for SciVis agents, outlines the associated challenges, provides a simple proof-of-concept evaluation example, and discusses how evaluation benchmarks can facilitate agent self-improvement. We advocate for a broader collaboration to develop a SciVis agentic evaluation benchmark that would not only assess existing capabilities but also drive innovation and stimulate future development in the field.
Authors:Matthew O. Jackson, Qiaozhu Me, Stephanie W. Wang, Yutong Xie, Walter Yuan, Seth Benzell, Erik Brynjolfsson, Colin F. Camerer, James Evans, Brian Jabarian, Jon Kleinberg, Juanjuan Meng, Sendhil Mullainathan, Asuman Ozdaglar, Thomas Pfeiffer, Moshe Tennenholtz, Robb Willer, Diyi Yang, Teng Ye
Abstract:
We discuss the three main areas comprising the new and emerging field of "AI Behavioral Science". This includes not only how AI can enhance research in the behavioral sciences, but also how the behavioral sciences can be used to study and better design AI and to understand how the world will change as AI and humans interact in increasingly layered and complex ways.
Authors:Niklas Elmqvist, Eve Hoggan, Hans-Jörg Schulz, Marianne Graves Petersen, Peter Dalsgaard, Ira Assent, Olav W. Bertelsen, Akhil Arora, Kaj Grønbæk, Susanne Bødker, Clemens Nylandsted Klokmose, Rachel Charlotte Smith, Sebastian Hubenschmid, Christoph A. Johns, Gabriela Molina León, Anton Wolter, Johannes Ellemose, Vaishali Dhanoa, Simon Aagaard Enni, Mille Skovhus Lunding, Karl-Emil Kjær Bilstrup, Juan Sánchez Esquivel, Luke Connelly, Rafael Pablos Sarabia, Morten Birk, Joachim Nyborg, Stefanie Zollmann, Tobias Langlotz, Meredith Siang-Yun Chou, Jens Emil Sloth Grønbæk, Michael Wessely, Yijing Jiang, Caroline Berger, Duosi Dai, Michael Mose Biskjaer, Germán Leiva, Jonas Frich, Eva Eriksson, Kim Halskov, Thorbjørn Mikkelsen, Nearchos Potamitis, Michel Yildirim, Arvind Srinivasan, Jeanette Falk, Nanna Inie, Ole Sejer Iversen, Hugo Andersson
Abstract:
AI's transformative impact on work, education, and everyday life makes it as much a political artifact as a technological one. Current AI models are opaque, centralized, and overly generic. The algorithmic automation they provide threatens human agency and democratic values in both workplaces and daily life. To confront such challenges, we turn to Scandinavian Participatory Design (PD), which was devised in the 1970s to face a similar threat from mechanical automation. In the PD tradition, technology is seen not just as an artifact, but as a locus of democracy. Drawing from this tradition, we propose Participatory AI as a PD approach to human-centered AI that applies five PD principles to four design challenges for algorithmic automation. We use concrete case studies to illustrate how to treat AI models less as proprietary products and more as shared socio-technical systems that enhance rather than diminish human agency, human dignity, and human values.
Authors:Si Chen, Xiuxiu Tang, Alison Cheng, Nitesh Chawla, G. Alex Ambrose, Ronald Metoyer
Abstract:
Generative AI is reshaping higher education, yet research has focused largely on students, while instructors remain understudied despite their central role in mediating adoption and modeling responsible use. We present the \textit{AI Academy}, a faculty development program that combined AI exploration with pedagogical reflection and peer learning. Rather than a course evaluated for outcomes, the Academy provided a setting to study how instructors build AI literacies in relation to tools, policies, peer practices, and institutional supports. We studied 25 instructors through pre/post surveys, learning logs, and facilitator interviews. Findings show AI literacy gains alongside new insights. We position instructors as designers of responsible AI practices and contribute a replicable program model, a co-constructed survey instrument, and design insights for professional development that adapts to evolving tools and fosters ethical discussion.
Authors:Andrew G. Breithaupt, Nayoung Choi, James D. Finch, Jeanne M. Powell, Arin L. Nelson, Oz A. Alon, Howard J. Rosen, Jinho D. Choi
Abstract:
Early detection of Alzheimer's disease and related dementias (ADRD) is critical for timely intervention, yet most diagnoses are delayed until advanced stages. While comprehensive patient narratives are essential for accurate diagnosis, prior work has largely focused on screening studies that classify cognitive status from interactions rather than supporting the diagnostic process. We designed voice-interactive conversational agents, leveraging large language models (LLMs), to elicit narratives relevant to ADRD from patients and informants. We evaluated the agent with 30 adults with suspected ADRD through conversation analysis (n=30), user surveys (n=19), and clinical validation against blinded specialist interviews (n=24). Symptoms detected by the agent aligned well with those identified by specialists across symptoms. Users appreciated the agent's patience and systematic questioning, which supported engagement and expression of complex, hard-to-describe experiences. This preliminary work suggests conversational agents may serve as structured front-end tools for dementia assessment, highlighting interaction design considerations in sensitive healthcare contexts.
Authors:Dominic Petrak, Thy Thy Tran, Iryna Gurevych
Abstract:
Although LLM-based conversational agents demonstrate strong fluency and coherence, they still produce undesirable behaviors (errors) that are challenging to prevent from reaching users during deployment. Recent research leverages large language models (LLMs) to detect errors and guide response-generation models toward improvement. However, current LLMs struggle to identify errors not explicitly specified in their instructions, such as those arising from updates to the response-generation model or shifts in user behavior. In this work, we introduce Automated Error Discovery, a framework for detecting and defining errors in conversational AI, and propose SEEED (Soft Clustering Extended Encoder-Based Error Detection), as an encoder-based approach to its implementation. We enhance the Soft Nearest Neighbor Loss by amplifying distance weighting for negative samples and introduce Label-Based Sample Ranking to select highly contrastive examples for better representation learning. SEEED outperforms adapted baselines -- including GPT-4o and Phi-4 -- across multiple error-annotated dialogue datasets, improving the accuracy for detecting unknown errors by up to 8 points and demonstrating strong generalization to unknown intent detection.
Authors:Giulia Botta, Marco Botta, Cristina Gena, Alessandro Mazzei, Massimo Donini, Alberto Lillo
Abstract:
Social robots are increasingly experimented in public and assistive settings, but their accessibility for Deaf users remains quite underexplored. Italian Sign Language (LIS) is a fully-fledged natural language that relies on complex manual and non-manual components. Enabling robots to communicate using LIS could foster more inclusive human robot interaction, especially in social environments such as hospitals, airports, or educational settings. This study investigates whether a commercial social robot, Pepper, can produce intelligible LIS signs and short signed LIS sentences. With the help of a Deaf student and his interpreter, an expert in LIS, we co-designed and implemented 52 LIS signs on Pepper using either manual animation techniques or a MATLAB based inverse kinematics solver. We conducted a exploratory user study involving 12 participants proficient in LIS, both Deaf and hearing. Participants completed a questionnaire featuring 15 single-choice video-based sign recognition tasks and 2 open-ended questions on short signed sentences. Results shows that the majority of isolated signs were recognized correctly, although full sentence recognition was significantly lower due to Pepper's limited articulation and temporal constraints. Our findings demonstrate that even commercially available social robots like Pepper can perform a subset of LIS signs intelligibly, offering some opportunities for a more inclusive interaction design. Future developments should address multi-modal enhancements (e.g., screen-based support or expressive avatars) and involve Deaf users in participatory design to refine robot expressivity and usability.
Authors:Lingyao Li, Renkai Ma, Zhaoqian Xue, Junjie Xiong
Abstract:
While Large Language Models (LLMs) are rapidly integrating into daily life, research on their risks often remains lab-based and disconnected from the problems users encounter "in the wild." While recent HCI research has begun to explore these user-facing risks, it typically concentrates on a singular LLM chatbot like ChatGPT or an isolated risk like privacy. To gain a holistic understanding of multi-risk across LLM chatbots, we analyze online discussions on Reddit around seven major LLM chatbots through the U.S. NIST's AI Risk Management Framework. We find that user-reported risks are unevenly distributed and platform-specific. While "Valid and Reliable" risk is the most frequently mentioned, each product also exhibits a unique "risk fingerprint;" for instance, user discussions associate GPT more with "Safe" and "Fair" issues, Gemini with "Privacy," and Claude with "Secure and Resilient" risks. Furthermore, the nature of these risks differs by their prevalence: less frequent risks like "Explainability" and "Privacy" manifest as nuanced user trade-offs, more common ones like "Fairness" are experienced as direct personal harms. Our findings reveal gaps between risks reported by system-centered studies and by users, highlighting the need for user-centered approaches that support users in their daily use of LLM chatbots.
Authors:Kunlin Cai, Jinghuai Zhang, Ying Li, Zhiyuan Wang, Xun Chen, Tianshi Li, Yuan Tian
Abstract:
The immersive nature of XR introduces a fundamentally different set of security and privacy (S&P) challenges due to the unprecedented user interactions and data collection that traditional paradigms struggle to mitigate. As the primary architects of XR applications, developers play a critical role in addressing novel threats. However, to effectively support developers, we must first understand how they perceive and respond to different threats. Despite the growing importance of this issue, there is a lack of in-depth, threat-aware studies that examine XR S&P from the developers' perspective. To fill this gap, we interviewed 23 professional XR developers with a focus on emerging threats in XR. Our study addresses two research questions aiming to uncover existing problems in XR development and identify actionable paths forward. By examining developers' perceptions of S&P threats, we found that: (1) XR development decisions (e.g., rich sensor data collection, user-generated content interfaces) are closely tied to and can amplify S&P threats, yet developers are often unaware of these risks, resulting in cognitive biases in threat perception; and (2) limitations in existing mitigation methods, combined with insufficient strategic, technical, and communication support, undermine developers' motivation, awareness, and ability to effectively address these threats. Based on these findings, we propose actionable and stakeholder-aware recommendations to improve XR S&P throughout the XR development process. This work represents the first effort to undertake a threat-aware, developer-centered study in the XR domain -- an area where the immersive, data-rich nature of the XR technology introduces distinctive challenges.
Authors:Nathan DeVrio, Vimal Mollyn, Chris Harrison
Abstract:
The ability to track a user's arm pose could be valuable in a wide range of applications, including fitness, rehabilitation, augmented reality input, life logging, and context-aware assistants. Unfortunately, this capability is not readily available to consumers. Systems either require cameras, which carry privacy issues, or utilize multiple worn IMUs or markers. In this work, we describe how an off-the-shelf smartphone and smartwatch can work together to accurately estimate arm pose. Moving beyond prior work, we take advantage of more recent ultra-wideband (UWB) functionality on these devices to capture absolute distance between the two devices. This measurement is the perfect complement to inertial data, which is relative and suffers from drift. We quantify the performance of our software-only approach using off-the-shelf devices, showing it can estimate the wrist and elbow joints with a \hl{median positional error of 11.0~cm}, without the user having to provide training data.
Authors:Vimal Mollyn, Nathan DeVrio, Chris Harrison
Abstract:
The ability to detect touch events on uninstrumented, everyday surfaces has been a long-standing goal for mixed reality systems. Prior work has shown that virtual interfaces bound to physical surfaces offer performance and ergonomic benefits over tapping at interfaces floating in the air. A wide variety of approaches have been previously developed, to which we contribute a new headset-integrated technique called \systemname. We use a combination of a computer-triggered camera and one or more infrared emitters to create structured shadows, from which we can accurately estimate hover distance (mean error of 6.9~mm) and touch contact (98.0\% accuracy). We discuss how our technique works across a range of conditions, including surface material, interaction orientation, and environmental lighting.
Authors:Gabriel Spadon, Oladapo Oyebode, Camilo M. Botero, Tushar Sharma, Floris Goerlandt, Ronald Pelot
Abstract:
This paper presents an overview of a human-centered initiative aimed at strengthening climate resilience along Nova Scotia's Eastern Shore. This region, a collection of rural villages with deep ties to the sea, faces existential threats from climate change that endanger its way of life. Our project moves beyond a purely technical response, weaving together expertise from Computer Science, Industrial Engineering, and Coastal Geography to co-create tools with the community. By integrating generational knowledge of residents, particularly elders, through the Eastern Shore Citizen Science Coastal Monitoring Network, this project aims to collaborate in building a living digital archive. This effort is hosted under Dalhousie University's Transforming Climate Action (TCA) initiative, specifically through its Transformative Adaptations to Social-Ecological Climate Change Trajectories (TranSECT) and TCA Artificial Intelligence (TCA-AI) projects. This work is driven by a collaboration model in which student teams work directly with residents. We present a detailed project timeline and a replicable model for how technology can support traditional communities, enabling them to navigate climate transformation more effectively.
Authors:Vimal Mollyn, Chris Harrison
Abstract:
In augmented and virtual reality (AR/VR) experiences, a user's arms and hands can provide a convenient and tactile surface for touch input. Prior work has shown on-body input to have significant speed, accuracy, and ergonomic benefits over in-air interfaces, which are common today. In this work, we demonstrate high accuracy, bare hands (i.e., no special instrumentation of the user) skin input using just an RGB camera, like those already integrated into all modern XR headsets. Our results show this approach can be accurate, and robust across diverse lighting conditions, skin tones, and body motion (e.g., input while walking). Finally, our pipeline also provides rich input metadata including touch force, finger identification, angle of attack, and rotation. We believe these are the requisite technical ingredients to more fully unlock on-skin interfaces that have been well motivated in the HCI literature but have lacked robust and practical methods.
Authors:Manuel Schmidt, Daniel A. Keim, Frederik L. Dennig
Abstract:
Factuality evaluation of large language model (LLM) outputs requires decomposing text into discrete "atomic" facts. However, existing definitions of atomicity are underspecified, with empirical results showing high disagreement among annotators, both human and model-based, due to unresolved ambiguity in fact decomposition. We present a visual analytics concept to expose and analyze annotation inconsistencies in fact extraction. By visualizing semantic alignment, granularity and referential dependencies, our approach aims to enable systematic inspection of extracted facts and facilitate convergence through guided revision loops, establishing a more stable foundation for factuality evaluation benchmarks and improving LLM evaluation.
Authors:Yutong Lin, Suyuan Liu, Kaiwen Guo, Haohua Du, Chao Liu, Xiang-Yang Li
Abstract:
In the mobile internet era, managing limited attention amid information overload is crucial for enhancing collaboration and information delivery. However, current attention-aware systems often depend on wearables or personalized data, limiting their scalability and cross-context adaptability. Inspired by psychological theories, we attempt to treat mobile notifications as naturally occurring external distractions and infer users' attention states based on their response behaviors and contextual information. Our goal is to build an attention-aware model that does not rely on personalized historical data or complex subjective input, while ensuring strong cold-start capability and cross-context adaptability. To this end, We design a field study framework integrating subjective and objective data, closely aligned with real-world external distractions (i.e., mobile notifications). Through field studies, we construct a fine-grained and interpretable dataset centered on the relationship among current context - external distractions - subjective attention. Through our field studies, we conduct an in-depth analysis of the relationships among users' response behaviors, response motivations, contextual information, and attention states. Building on our findings, we propose AttenTrack, a lightweight, privacy-friendly attention awareness model with strong cold-start capability. The model relies solely on non-privacy-sensitive objective data available on mobile devices, and can be applied to a variety of attention management tasks. In addition, we will publicly release the constructed dataset to support future research and advance the field of mobile attention awareness.
Authors:Larissa R. de S. Shibata, Ankit A. Ravankar, Jose Victorio Salazar Luces, Yasuhisa Hirata
Abstract:
Shopping plays a significant role in shaping consumer identity and social integration. However, for individuals with visual impairments, navigating in supermarkets and identifying products can be an overwhelming and challenging experience. This paper presents an AI-based shopping assistant prototype designed to enhance the autonomy and inclusivity of visually impaired individuals in supermarket environments. The system integrates multiple technologies, including computer vision, speech recognition, text-to-speech synthesis, and indoor navigation, into a single, user-friendly platform. Using cameras for ArUco marker detection and real-time environmental scanning, the system helps users navigate the store, identify product locations, provide real-time auditory guidance, and gain context about their surroundings. The assistant interacts with the user through voice commands and multimodal feedback, promoting a more dynamic and engaging shopping experience. The system was evaluated through experiments, which demonstrated its ability to guide users effectively and improve their shopping experience. This paper contributes to the development of inclusive AI-driven assistive technologies aimed at enhancing accessibility and user independence for the shopping experience.
Authors:Shyama Sastha Krishnamoorthy Srinivasan, Mohan Kumar, Pushpendra Singh
Abstract:
Personal Health Informatics (PHI), which leverages digital tools and information systems to support health assessment and self-care, holds promise for empowering individuals and transforming healthcare delivery. However, barriers to its adoption remain underexplored in the Indian context. This study investigates PHI adoption among Indian users and stakeholders using a multi-method approach. An awareness survey (n = 87) examined the usage of wearables and general PHI engagement, followed by semi-structured interviews (n = 22) that explored motivations, usage patterns, and health information sources. Qualitative analysis revealed that while PHI is valued for health monitoring and shared/collective care, its adoption is hindered by factors such as low health literacy, usability challenges, and mistrust in digital health platforms. Further stakeholder interviews and co-design workshops informed the development of a Figma-based prototype, which was evaluated for usability. Based on these findings, we offer design recommendations for an integrated, user-controlled PHI platform featuring accessible analytics and verifiable health information. Our insights highlight the socio-technical challenges of PHI adoption in India and underscore the need for reliable, user-centric solutions to support proactive healthcare.
Authors:Clara Sayffaerth, Annika Köhler, Julian Rasch, Albrecht Schmidt, Florian Müller
Abstract:
Transferring knowledge across generations is fundamental to human civilization, yet the challenge of passing on complex practical skills persists. Methods without a physically present instructor, such as videos, often fail to explain complex manual tasks, where spatial and social factors are critical. Technologies such as eXtended Reality and Artificial Intelligence hold the potential to retain expert knowledge and facilitate the creation of tailored, contextualized, and asynchronous explanations regardless of time and place. In contrast to videos, the learner's perspective can be different from the recorded perspective in XR. This paper investigates the impact of asynchronous first- and third-person perspectives and gaze visualizations on efficiency, feeling of embodiment, and connectedness during manual tasks. The empirical results of our study (N=36) show that the first-person perspective is better in quantitative measures and preferred by users. We identify best practices for presenting preserved knowledge and provide guidelines for designing future systems.
Authors:Subham Kutum, Abhijit Sinha, Hemant Kumar Kathania, Sudarsana Reddy Kadiri, Mahesh Chandra Govil
Abstract:
Numerous methods have been proposed to enhance Keyword Spotting (KWS) in adult speech, but children's speech presents unique challenges for KWS systems due to its distinct acoustic and linguistic characteristics. This paper introduces a zero-shot KWS approach that leverages state-of-the-art self-supervised learning (SSL) models, including Wav2Vec2, HuBERT and Data2Vec. Features are extracted layer-wise from these SSL models and used to train a Kaldi-based DNN KWS system. The WSJCAM0 adult speech dataset was used for training, while the PFSTAR children's speech dataset was used for testing, demonstrating the zero-shot capability of our method. Our approach achieved state-of-the-art results across all keyword sets for children's speech. Notably, the Wav2Vec2 model, particularly layer 22, performed the best, delivering an ATWV score of 0.691, a MTWV score of 0.7003 and probability of false alarm and probability of miss of 0.0164 and 0.0547 respectively, for a set of 30 keywords. Furthermore, age-specific performance evaluation confirmed the system's effectiveness across different age groups of children. To assess the system's robustness against noise, additional experiments were conducted using the best-performing layer of the best-performing Wav2Vec2 model. The results demonstrated a significant improvement over traditional MFCC-based baseline, emphasizing the potential of SSL embeddings even in noisy conditions. To further generalize the KWS framework, the experiments were repeated for an additional CMU dataset. Overall the results highlight the significant contribution of SSL features in enhancing Zero-Shot KWS performance for children's speech, effectively addressing the challenges associated with the distinct characteristics of child speakers.
Authors:Adam Coscia, Shunan Guo, Eunyee Koh, Alex Endert
Abstract:
As multi-turn dialogues with large language models (LLMs) grow longer and more complex, how can users better evaluate and review progress on their conversational goals? We present OnGoal, an LLM chat interface that helps users better manage goal progress. OnGoal provides real-time feedback on goal alignment through LLM-assisted evaluation, explanations for evaluation results with examples, and overviews of goal progression over time, enabling users to navigate complex dialogues more effectively. Through a study with 20 participants on a writing task, we evaluate OnGoal against a baseline chat interface without goal tracking. Using OnGoal, participants spent less time and effort to achieve their goals while exploring new prompting strategies to overcome miscommunication, suggesting tracking and visualizing goals can enhance engagement and resilience in LLM dialogues. Our findings inspired design implications for future LLM chat interfaces that improve goal communication, reduce cognitive load, enhance interactivity, and enable feedback to improve LLM performance.
Authors:Prakash Shukla, Paul Parsons
Abstract:
Framing -- how designers define and reinterpret problems, shape narratives, and guide audience understanding -- is central to design practice. Yet in visualization research, framing has been examined mostly through its rhetorical and perceptual effects on audiences, leaving its role in the design process underexplored. This study addresses that gap by analyzing publicly available podcasts and book chapters in which over 80 professional visualization designers reflect on their work. We find that framing is a pervasive, iterative activity, evident in scoping problems, interpreting data, aligning with stakeholder goals, and shaping narrative direction. Our analysis identifies the conditions that trigger reframing and the strategies practitioners use to navigate uncertainty and guide design. These findings position framing as a core dimension of visualization practice and underscore the need for research and education to support the interpretive and strategic judgment that practitioners exercise throughout the design process.
Authors:Haitham Abdelsalam, Chanelle Montpetit, Arash Harirpoush, Maryse Fortin, Yiming Xiao
Abstract:
Chronic neck pain is a prevalent condition that affects millions of individuals worldwide, causing significant individual suffering and socioeconomic burdens. Although exercise rehabilitation is a staple in relieving pain and improving muscle function for the condition, traditional one-on-one rehabilitation sessions are costly and suffer from poor adherence and accessibility for the patients. Thanks to the increasing accessibility and recent advancements in sensing and display technology, virtual reality (VR) offers the potential to tackle the challenges in traditional exercise rehabilitation, particularly through gamification. However, still in its infancy, VR-based neck exercise rehabilitation lacks exploration in effective gamification strategies and existing prototypes. To address the knowledge gap, we conduct an exploratory study on the gamification strategies for VR-based cervical rehabilitation exercises by using chin tuck and neck range of motion exercises as examples. Specifically, with different game themes, we investigate a survival and level progression strategy for muscle strengthening (chin tuck) exercise for the first time, and the suitability of ambient reward for a neck range of motion exercise. Through a preliminary user study, we assess the proposed novel VR neck rehabilitation games and they demonstrate excellent usability, engagement, and perceived health value.
Authors:Haoyang Yang, Elliott H. Faa, Weijian Liu, Shunan Guo, Duen Horng Chau, Yalong Yang
Abstract:
Exploring and comprehending relevant academic literature is a vital yet challenging task for researchers, especially given the rapid expansion in research publications. This task fundamentally involves sensemaking - interpreting complex, scattered information sources to build understanding. While emerging immersive analytics tools have shown cognitive benefits like enhanced spatial memory and reduced mental load, they predominantly focus on information synthesis (e.g., organizing known documents). In contrast, the equally important information foraging phase - discovering and gathering relevant literature - remains underexplored within immersive environments, hindering a complete sensemaking workflow. To bridge this gap, we introduce LitForager, an interactive literature exploration tool designed to facilitate information foraging of research literature within an immersive sensemaking workflow using network-based visualizations and multimodal interactions. Developed with WebXR and informed by a formative study with researchers, LitForager supports exploration guidance, spatial organization, and seamless transition through a 3D literature network. An observational user study with 15 researchers demonstrated LitForager's effectiveness in supporting fluid foraging strategies and spatial sensemaking through its multimodal interface.
Authors:Xiang Li, Per Ola Kristensson
Abstract:
Object selection in Mixed Reality (MR) becomes particularly challenging in dense or occluded environments, where traditional mid-air ray-casting often leads to ambiguity and reduced precision. We present two complementary techniques: (1) a real-time Bezier Curve selection paradigm guided by finger curvature, enabling expressive one-handed trajectories, and (2) an on-body disambiguation mechanism that projects the four nearest candidates onto the user's forearm via proximity-based mapping. Together, these techniques combine flexible, user-controlled selection with tactile, proprioceptive disambiguation. We evaluated their independent and joint effects in a 2x2 within-subjects study (N = 24), crossing interaction paradigm (Bezier Curve vs. Linear Ray) with interaction medium (Mid-air vs. On-body). Results show that on-body disambiguation significantly reduced selection errors and physical demand while improving perceived performance, hedonic quality, and user preference. Bezier input provided effective access to occluded targets but incurred longer task times and greater effort under some conditions. We conclude with design implications for integrating curved input and on-body previews to support precise, adaptive selection in immersive environments.
Authors:Stanislav Pozdniakov, Jonathan Brazil, Oleksandra Poquet, Stephan Krusche, Santiago Berrezueta-Guzman, Shazia Sadiq, Hassan Khosravi
Abstract:
In the contemporary educational landscape, particularly in large classroom settings, discussion forums have become a crucial tool for promoting interaction and addressing student queries. These forums foster a collaborative learning environment where students engage with both the teaching team and their peers. However, the sheer volume of content generated in these forums poses two significant interconnected challenges: How can we effectively identify common misunderstandings that arise in student discussions? And once identified, how can instructors use these insights to address them effectively? This paper explores the approach to integrating large language models (LLMs) and Retrieval-Augmented Generation (RAG) to tackle these challenges. We then demonstrate the approach Misunderstanding to Mastery (M2M) with authentic data from three computer science courses, involving 1355 students with 2878 unique posts, followed by an evaluation with five instructors teaching these courses. Results show that instructors found the approach promising and valuable for teaching, effectively identifying misunderstandings and generating actionable insights. Instructors highlighted the need for more fine-grained groupings, clearer metrics, validation of the created resources, and ethical considerations around data anonymity.
Authors:Timon Merk, Saeed Salehi, Richard M. Koehler, Qiming Cui, Maria Olaru, Amelia Hahn, Nicole R. Provenza, Simon Little, Reza Abbasi-Asl, Phil A. Starr, Wolf-Julian Neumann
Abstract:
Neural decoding of pathological and physiological states can enable patient-individualized closed-loop neuromodulation therapy. Recent advances in pre-trained large-scale foundation models offer the potential for generalized state estimation without patient-individual training. Here we present a foundation model trained on chronic longitudinal deep brain stimulation recordings spanning over 24 days. Adhering to long time-scale symptom fluctuations, we highlight the extended context window of 30 minutes. We present an optimized pre-training loss function for neural electrophysiological data that corrects for the frequency bias of common masked auto-encoder loss functions due to the 1-over-f power law. We show in a downstream task the decoding of Parkinson's disease symptoms with leave-one-subject-out cross-validation without patient-individual training.
Authors:Yuekun Wu, Yik Lung Pang, Andrea Cavallaro, Changjae Oh
Abstract:
Human-robot teaming (HRT) systems often rely on large-scale datasets of human and robot interactions, especially for close-proximity collaboration tasks such as human-robot handovers. Learning robot manipulation policies from raw, real-world image data requires a large number of robot-action trials in the physical environment. Although simulation training offers a cost-effective alternative, the visual domain gap between simulation and robot workspace remains a major limitation. We introduce a method for training HRT policies, focusing on human-to-robot handovers, solely from RGB images without the need for real-robot training or real-robot data collection. The goal is to enable the robot to reliably receive objects from a human with stable grasping while avoiding collisions with the human hand. The proposed policy learner leverages sparse-view Gaussian Splatting reconstruction of human-to-robot handover scenes to generate robot demonstrations containing image-action pairs captured with a camera mounted on the robot gripper. As a result, the simulated camera pose changes in the reconstructed scene can be directly translated into gripper pose changes. Experiments in both Gaussian Splatting reconstructed scene and real-world human-to-robot handover experiments demonstrate that our method serves as a new and effective representation for the human-to-robot handover task, contributing to more seamless and robust HRT.
Authors:Jindu Wang, Ke Zhou, Haoyu Ren, Per Ola Kristensson, Xiang Li
Abstract:
Window management in virtual reality (VR) remains a challenging task due to the spatial complexity and physical demands of current interaction methods. We introduce Handows, a palm-based interface that enables direct manipulation of spatial windows through familiar smartphone-inspired gestures on the user's non-dominant hand. Combining ergonomic layout design with body-centric input and passive haptics, Handows supports four core operations: window selection, closure, positioning, and scaling. We evaluate Handows in a user study (N=15) against two common VR techniques (virtual hand and controller) across these core window operations. Results show that Handows significantly reduces physical effort and head movement while improving task efficiency and interaction precision. A follow-up case study (N=8) demonstrates Handows' usability in realistic multitasking scenarios, highlighting user-adapted workflows and spontaneous layout strategies. Our findings suggest the potential of embedding mobile-inspired metaphors into proprioceptive body-centric interfaces to support low-effort and spatially coherent interaction in VR.
Authors:Jürgen Bernard, Mara Solen, Helen Novak Lauscher, Kurtis Stewart, Kendall Ho, Tamara Munzner
Abstract:
At the beginning of the COVID-19 pandemic, HealthLink BC (HLBC) rapidly integrated physicians into the triage process of their virtual healthcare service to improve patient outcomes and satisfaction with this service and preserve health care system capacity. We present the design and implementation of a visual analytics tool, VIVA (Virtual healthcare Interactions using Visual Analytics), to support HLBC in analysing various forms of usage data from the service. We abstract HLBC's data and data analysis tasks, which we use to inform our design of VIVA. We also present the interactive workflow abstraction of Scan, Act, Adapt. We validate VIVA's design through three case studies with stakeholder domain experts. We also propose the Controllability Through Configuration model to conduct and analyze design studies, and discuss architectural evolution of VIVA through that lens. It articulates configuration, both that specified by a developer or technical power user and that constructed automatically through log data from previous interactive sessions, as a bridge between the rigidity of hardwired programming and the time-consuming implementation of full end-user interactivity.
Availability: Supplemental materials at https://osf.io/wv38n
Authors:Shiyang Lai, Junsol Kim, Nadav Kunievsky, Yujin Potter, James Evans
Abstract:
Current AI systems minimize risk by enforcing ideological neutrality, yet this may introduce automation bias by suppressing cognitive engagement in human decision-making. We conducted randomized trials with 2,500 participants to test whether culturally biased AI enhances human decision-making. Participants interacted with politically diverse GPT-4o variants on information evaluation tasks. Partisan AI assistants enhanced human performance, increased engagement, and reduced evaluative bias compared to non-biased counterparts, with amplified benefits when participants encountered opposing views. These gains carried a trust penalty: participants underappreciated biased AI and overcredited neutral systems. Exposing participants to two AIs whose biases flanked human perspectives closed the perception-performance gap. These findings complicate conventional wisdom about AI neutrality, suggesting that strategic integration of diverse cultural biases may foster improved and resilient human decision-making.
Authors:Ding Xia, Naoto Inoue, Qianru Qiu, Kotaro Kikuchi
Abstract:
Colors play a crucial role in the design of vector graphic documents by enhancing visual appeal, facilitating communication, improving usability, and ensuring accessibility. In this context, color recommendation involves suggesting appropriate colors to complete or refine a design when one or more colors are missing or require alteration. Traditional methods often struggled with these challenges due to the complex nature of color design and the limited data availability. In this study, we explored the use of pretrained Large Language Models (LLMs) and their commonsense reasoning capabilities for color recommendation, raising the question: Can pretrained LLMs serve as superior designers for color recommendation tasks? To investigate this, we developed a robust, rigorously validated pipeline, ColorGPT, that was built by systematically testing multiple color representations and applying effective prompt engineering techniques. Our approach primarily targeted color palette completion by recommending colors based on a set of given colors and accompanying context. Moreover, our method can be extended to full palette generation, producing an entire color palette corresponding to a provided textual description. Experimental results demonstrated that our LLM-based pipeline outperformed existing methods in terms of color suggestion accuracy and the distribution of colors in the color palette completion task. For the full palette generation task, our approach also yielded improvements in color diversity and similarity compared to current techniques.
Authors:Lei Zhang, Shuyao Zhou, Amna Liaqat, Tinney Mak, Brian Berengard, Emily Qian, Andrés Monroy-Hernández
Abstract:
Despite their potential to enhance children's learning experiences, AI-enabled AR technologies are predominantly used in ways that position children as consumers rather than creators. We introduce Capybara, an AR-based and AI-powered visual programming environment that empowers children to create, customize, and program 3D characters overlaid onto the physical world. Capybara enables children to create virtual characters and accessories using text-to-3D generative AI models, and to animate these characters through auto-rigging and body tracking. In addition, our system employs vision-based AI models to recognize physical objects, allowing children to program interactive behaviors between virtual characters and their physical surroundings. We demonstrate the expressiveness of Capybara through a set of novel AR experiences. We conducted user studies with 20 children in the United States and Argentina. Our findings suggest that Capybara can empower children to harness AI in authoring personalized and engaging AR experiences that seamlessly bridge the virtual and physical worlds.
Authors:Minsuk Chang, Yao Wang, Huichen Will Wang, Yuanhong Zhou, Andreas Bulling, Cindy Xiong Bearfield
Abstract:
Accounting for individual differences can improve the effectiveness of visualization design. While the role of visual attention in visualization interpretation is well recognized, existing work often overlooks how this behavior varies based on visual literacy levels. Based on data from a 235-participant user study covering three visualization tests (mini-VLAT, CALVI, and SGL), we show that distinct attention patterns in visual data exploration can correlate with participants' literacy levels: While experts (high-scorers) generally show a strong attentional focus, novices (low-scorers) focus less and explore more. We then propose two computational models leveraging these insights: Lit2Sal -- a novel visual saliency model that predicts observer attention given their visualization literacy level, and Sal2Lit -- a model to predict visual literacy from human visual attention data. Our quantitative and qualitative evaluation demonstrates that Lit2Sal outperforms state-of-the-art saliency models with literacy-aware considerations. Sal2Lit predicts literacy with 86% accuracy using a single attention map, providing a time-efficient supplement to literacy assessment that only takes less than a minute. Taken together, our unique approach to consider individual differences in salience models and visual attention in literacy assessments paves the way for new directions in personalized visual data communication to enhance understanding.
Authors:Pragya Singh, Ankush Gupta, Mohan Kumar, Pushpendra Singh
Abstract:
Emotional and mental well-being are vital components of quality of life, and with the rise of smart devices like smartphones, wearables, and artificial intelligence (AI), new opportunities for monitoring emotions in everyday settings have emerged. However, for AI algorithms to be effective, they require high-quality data and accurate annotations. As the focus shifts towards collecting emotion data in real-world environments to capture more authentic emotional experiences, the process of gathering emotion annotations has become increasingly complex. This work explores the challenges of everyday emotion data collection from the perspectives of key stakeholders. We collected 75 survey responses, performed 32 interviews with the public, and 3 focus group discussions (FGDs) with 12 mental health professionals. The insights gained from a total of 119 stakeholders informed the development of our framework, AnnoSense, designed to support everyday emotion data collection for AI. This framework was then evaluated by 25 emotion AI experts for its clarity, usefulness, and adaptability. Lastly, we discuss the potential next steps and implications of AnnoSense for future research in emotion AI, highlighting its potential to enhance the collection and analysis of emotion data in real-world contexts.
Authors:Tingying He, Jason Dykes, Petra Isenberg, Tobias Isenberg
Abstract:
We present a new comprehensive theory for explaining, exploring, and using pattern as a visual variable in visualization. Although patterns have long been used for data encoding and continue to be valuable today, their conceptual foundations are precarious: the concepts and terminology used across the research literature and in practice are inconsistent, making it challenging to use patterns effectively and to conduct research to inform their use. To address this problem, we conduct a comprehensive cross-disciplinary literature review that clarifies ambiguities around the use of "pattern" and "texture". As a result, we offer a new consistent treatment of pattern as a composite visual variable composed of structured groups of graphic primitives that can serve as marks for encoding data individually and collectively. This new and widely applicable formulation opens a sizable design space for the visual variable pattern, which we formalize as a new system comprising three sets of variables: the spatial arrangement of primitives, the appearance relationships among primitives, and the retinal visual variables that characterize individual primitives. We show how our pattern system relates to existing visualization theory and highlight opportunities for visualization design. We further explore patterns based on complex spatial arrangements, demonstrating explanatory power and connecting our conceptualization to broader theory on maps and cartography. An author version and additional materials are available on OSF: osf.io/z7ae2.
Authors:Laura Schütz, Sasan Matinfar, Ulrich Eck, Daniel Roth, Nassir Navab
Abstract:
In Augmented Reality (AR), virtual objects interact with real objects. However, the lack of physicality of virtual objects leads to the absence of natural sonic interactions. When virtual and real objects collide, either no sound or a generic sound is played. Both lead to an incongruent multisensory experience, reducing interaction and object realism. Unlike in Virtual Reality (VR) and games, where predefined scenes and interactions allow for the playback of pre-recorded sound samples, AR requires real-time sound synthesis that dynamically adapts to novel contexts and objects to provide audiovisual congruence during interaction. To enhance real-virtual object interactions in AR, we propose a framework for context-aware sounds using methods from computer vision to recognize and segment the materials of real objects. The material's physical properties and the impact dynamics of the interaction are used to generate material-based sounds in real-time using physical modelling synthesis. In a user study with 24 participants, we compared our congruent material-based sounds to a generic sound effect, mirroring the current standard of non-context-aware sounds in AR applications. The results showed that material-based sounds led to significantly more realistic sonic interactions. Material-based sounds also enabled participants to distinguish visually similar materials with significantly greater accuracy and confidence. These findings show that context-aware, material-based sonic interactions in AR foster a stronger sense of realism and enhance our perception of real-world surroundings.
Authors:Zhengxin Zhang, Shufang Qian, Yi Wang, Xiao Liu, Thuong Hoang, Chetan Arora, Jingjing Zhang, Henry Been Lirn Duh
Abstract:
Cybersickness significantly impacts the user experience in VR applications. Locomotion tunneling is a widely adopted technique for mitigating cybersickness in susceptible users. However, there is a lack of research investigating the effects of prolonged use of locomotion tunneling among novice users. To fill this gap, we used VRChat as our experimental platform. We recruited 24 novice VR users, defined as participants with no prior experience using immersive virtual environments. We collected five days of data within a one-week period. The results indicated that participants exhibited significant mitigation to cybersickness by Day 4. However, a change in the VR scene on Day 5 led to a notable increase in cybersickness symptoms. Qualitative feedback revealed participant-perceived causes of cybersickness and suggested that the effectiveness of locomotion tunneling was limited in some scenarios. Finally, we discussed the limitations of the study and proposed directions for future research.
Authors:Songlin Xu, Xinyu Zhang
Abstract:
In this paper, we introduce an AI-mediated framework that can provide intelligent feedback to augment human cognition. Specifically, we leverage deep reinforcement learning (DRL) to provide adaptive time pressure feedback to improve user performance in a math arithmetic task. Time pressure feedback could either improve or deteriorate user performance by regulating user attention and anxiety. Adaptive time pressure feedback controlled by a DRL policy according to users' real-time performance could potentially solve this trade-off problem. However, the DRL training and hyperparameter tuning may require large amounts of data and iterative user studies. Therefore, we propose a dual-DRL framework that trains a regulation DRL agent to regulate user performance by interacting with another simulation DRL agent that mimics user cognition behaviors from an existing dataset. Our user study demonstrates the feasibility and effectiveness of the dual-DRL framework in augmenting user performance, in comparison to the baseline group.
Authors:Stefan Graser, Martin Schrepp, Stephan Böhm
Abstract:
Innovative technologies, such as Augmented Reality (AR), introduce new interaction paradigms, demanding the identification of software requirements during the software development process. In general, design recommendations are related to this, supporting the design of applications positively and meeting stakeholder needs. However, current research lacks context-specific AR design recommendations. This study addresses this gap by identifying and analyzing practical AR design recommendations relevant to the evaluation phase of the User-Centered Design (UCD) process. We rely on an existing dataset of Mixed Reality (MR) design recommendations. We applied a multi-method approach by (1) extending the dataset with AR-specific recommendations published since 2020, (2) classifying the identified recommendations using a NLP classification approach based on a pre-trained Sentence Transformer model, (3) summarizing the content of all topics, and (4) evaluating their relevance concerning AR in Corporate Training (CT) both based on a qualitative Round Robin approach with five experts. As a result, an updated dataset of 597 practitioner design recommendations, classified into 84 topics, is provided with new insights into their applicability in the context of AR in CT. Based on this, 32 topics with a total of 284 statements were evaluated as relevant for AR in CT. This research directly contributes to the authors' work for extending their AR-specific User Experience (UX) measurement approach, supporting AR authors in targeting the improvement of AR applications for CT scenarios.
Authors:Ayan Biswas, Terece L. Turton, Nishath Rajiv Ranasinghe, Shawn Jones, Bradley Love, William Jones, Aric Hagberg, Han-Wei Shen, Nathan DeBardeleben, Earl Lawrence
Abstract:
We present VizGenie, a self-improving, agentic framework that advances scientific visualization through large language model (LLM) by orchestrating of a collection of domain-specific and dynamically generated modules. Users initially access core functionalities--such as threshold-based filtering, slice extraction, and statistical analysis--through pre-existing tools. For tasks beyond this baseline, VizGenie autonomously employs LLMs to generate new visualization scripts (e.g., VTK Python code), expanding its capabilities on-demand. Each generated script undergoes automated backend validation and is seamlessly integrated upon successful testing, continuously enhancing the system's adaptability and robustness. A distinctive feature of VizGenie is its intuitive natural language interface, allowing users to issue high-level feature-based queries (e.g., ``visualize the skull"). The system leverages image-based analysis and visual question answering (VQA) via fine-tuned vision models to interpret these queries precisely, bridging domain expertise and technical implementation. Additionally, users can interactively query generated visualizations through VQA, facilitating deeper exploration. Reliability and reproducibility are further strengthened by Retrieval-Augmented Generation (RAG), providing context-driven responses while maintaining comprehensive provenance records. Evaluations on complex volumetric datasets demonstrate significant reductions in cognitive overhead for iterative visualization tasks. By integrating curated domain-specific tools with LLM-driven flexibility, VizGenie not only accelerates insight generation but also establishes a sustainable, continuously evolving visualization practice. The resulting platform dynamically learns from user interactions, consistently enhancing support for feature-centric exploration and reproducible research in scientific visualization.
Authors:Xin Sun, Lei Wang, Yue Li, Jie Li, Massimo Poesio, Julian Frommel, Koen Hinriks, Jiahuan Pei
Abstract:
With large language models (LLMs) on the rise, in-game interactions are shifting from rigid commands to natural conversations. However, the impacts of LLMs on player performance and game experience remain underexplored. This work explores LLM's role as a co-builder during gameplay, examining its impact on task performance, usability, and player experience. Using Minecraft as a sandbox, we present an LLM-assisted interface that engages players through natural language, aiming to facilitate creativity and simplify complex gaming commands. We conducted a mixed-methods study with 30 participants, comparing LLM-assisted and command-based interfaces across simple and complex game tasks. Quantitative and qualitative analyses reveal that the LLM-assisted interface significantly improves player performance, engagement, and overall game experience. Additionally, task complexity has a notable effect on player performance and experience across both interfaces. Our findings highlight the potential of LLM-assisted interfaces to revolutionize virtual experiences, emphasizing the importance of balancing intuitiveness with predictability, transparency, and user agency in AI-driven, multimodal gaming environments.
Authors:Yifei Zhang, Lin-Ping Yuan, Yuheng Zhao, Jielin Feng, Siming Chen
Abstract:
Particle effects are widely used in games and animation to simulate natural phenomena or stylized visual effects. However, creating effect artworks is challenging for non-expert users due to their lack of specialized skills, particularly in finding particle effects with kinematic behaviors that match their intent. To address these issues, we present KinemaFX, a kinematic-driven interactive system, to assist non-expert users in constructing customized particle effect artworks. We propose a conceptual model of particle effects that captures both semantic features and kinematic behaviors. Based on the model, KinemaFX adopts a workflow powered by Large Language Models (LLMs) that supports intent expression through combined semantic and kinematic inputs, while enabling implicit preference-guided exploration and subsequent creation of customized particle effect artworks based on exploration results. Additionally, we developed a kinematic-driven method to facilitate efficient interactive particle effect search within KinemaFX via structured representation and measurement of particle effects. To evaluate KinemaFX, we illustrate usage scenarios and conduct a user study employing an ablation approach. Evaluation results demonstrate that KinemaFX effectively supports users in efficiently and customarily creating particle effect artworks.
Authors:Zijian Zhang, Pan Chen, Fangshi Du, Runlong Ye, Oliver Huang, Michael Liut, Alán Aspuru-Guzik
Abstract:
Efficiently navigating and understanding academic papers is crucial for scientific progress. Traditional linear formats like PDF and HTML can cause cognitive overload and obscure a paper's hierarchical structure, making it difficult to locate key information. While LLM-based chatbots offer summarization, they often lack nuanced understanding of specific sections, may produce unreliable information, and typically discard the document's navigational structure. Drawing insights from a formative study on academic reading practices, we introduce TreeReader, a novel language model-augmented paper reader. TreeReader decomposes papers into an interactive tree structure where each section is initially represented by an LLM-generated concise summary, with underlying details accessible on demand. This design allows users to quickly grasp core ideas, selectively explore sections of interest, and verify summaries against the source text. A user study was conducted to evaluate TreeReader's impact on reading efficiency and comprehension. TreeReader provides a more focused and efficient way to navigate and understand complex academic literature by bridging hierarchical summarization with interactive exploration.
Authors:Yue Luo, Xinyan Yu, Tram Thi Minh Tran, Marius Hoggenmueller
Abstract:
Uncertainty is an inherent aspect of autonomous vehicle (AV) decision-making, yet it is rarely communicated to pedestrians, which hinders transparency. This study investigates how AV uncertainty can be conveyed through two approaches: explicit communication (confidence percentage displays) and implicit communication (vehicle motion cues), across different confidence levels (high and low). Through a within-subject VR experiment (N=26), we evaluated these approaches in a crossing scenario, assessing interface qualities (visibility and intuitiveness), how well the information conveyed the vehicle's level of confidence, and their impact on participants' perceived safety, trust, and user experience. Our results show that explicit communication is more effective and preferred for conveying uncertainty, enhancing safety, trust, and user experience. Conversely, implicit communication introduces ambiguity, especially when AV confidence is low. This research provides empirical insights into how uncertainty communication shapes pedestrian interpretation of AV behaviour and offer design guidance for external interfaces that integrate uncertainty as a communicative element.
Authors:Yuheng Zhao, Xueli Shu, Liwen Fan, Lin Gao, Yu Zhang, Siming Chen
Abstract:
Visual analytics (VA) is typically applied to complex data, thus requiring complex tools. While visual analytics empowers analysts in data analysis, analysts may get lost in the complexity occasionally. This highlights the need for intelligent assistance mechanisms. However, even the latest LLM-assisted VA systems only provide help when explicitly requested by the user, making them insufficiently intelligent to offer suggestions when analysts need them the most. We propose a ProactiveVA framework in which LLM-powered UI agent monitors user interactions and delivers context-aware assistance proactively. To design effective proactive assistance, we first conducted a formative study analyzing help-seeking behaviors in user interaction logs, identifying when users need proactive help, what assistance they require, and how the agent should intervene. Based on this analysis, we distilled key design requirements in terms of intent recognition, solution generation, interpretability and controllability. Guided by these requirements, we develop a three-stage UI agent pipeline including perception, reasoning, and acting. The agent autonomously perceives users' needs from VA interaction logs, providing tailored suggestions and intuitive guidance through interactive exploration of the system. We implemented the framework in two representative types of VA systems, demonstrating its generalizability, and evaluated the effectiveness through an algorithm evaluation, case and expert study and a user study. We also discuss current design trade-offs of proactive VA and areas for further exploration.
Authors:Choro Ulan Uulu, Mikhail Kulyabin, Layan Etaiwi, Nuno Miguel Martins Pacheco, Jan Joosten, Kerstin Röse, Filippos Petridis, Jan Bosch, Helena Holmström Olsson
Abstract:
Computer-Aided Engineering (CAE) enables simulation experts to optimize complex models, but faces challenges in user experience (UX) that limit efficiency and accessibility. While artificial intelligence (AI) has demonstrated potential to enhance CAE processes, research integrating these fields with a focus on UX remains fragmented. This paper presents a multivocal literature review (MLR) examining how AI enhances UX in CAE software across both academic research and industry implementations. Our analysis reveals significant gaps between academic explorations and industry applications, with companies actively implementing LLMs, adaptive UIs, and recommender systems while academic research focuses primarily on technical capabilities without UX validation. Key findings demonstrate opportunities in AI-powered guidance, adaptive interfaces, and workflow automation that remain underexplored in current research. By mapping the intersection of these domains, this study provides a foundation for future work to address the identified research gaps and advance the integration of AI to improve CAE user experience.
Authors:Yaomin Jiang, Levin Brinkmann, Anne-Marie Nussberger, Ivan Soraperra, Jean-François Bonnefon, Iyad Rahwan
Abstract:
Partner selection is crucial for cooperation and hinges on communication. As artificial agents, especially those powered by large language models (LLMs), become more autonomous, intelligent, and persuasive, they compete with humans for partnerships. Yet little is known about how humans select between human and AI partners and adapt under AI-induced competition pressure. We constructed a communication-based partner selection game and examined the dynamics in hybrid mini-societies of humans and bots powered by a state-of-the-art LLM. Through three experiments (N = 975), we found that bots, though more prosocial than humans and linguistically distinguishable, were not selected preferentially when their identity was hidden. Instead, humans misattributed bots' behaviour to humans and vice versa. Disclosing bots' identity induced a dual effect: it reduced bots' initial chances of being selected but allowed them to gradually outcompete humans by facilitating human learning about the behaviour of each partner type. These findings show how AI can reshape social interaction in mixed societies and inform the design of more effective and cooperative hybrid systems.
Authors:Ruohao Li, Jiawei Li, Jia Sun, Zhiqing Wu, Zisu Li, Ziyan Wang, Ge Lin Kan, Mingming Fan
Abstract:
Reminiscence activities, which involve recalling and sharing past experiences, have proven beneficial for improving cognitive function, mood, and overall well-being. However, urbanization has led to the disappearance of familiar environments, removing visual and audio cues for effective reminiscence. While old photos can serve as visual cues to aid reminiscence, it is challenging for people to reconstruct the reminisced content and environment that are not in the photos. Virtual reality (VR) and artificial intelligence (AI) offer the ability to reconstruct an immersive environment with dynamic content and to converse with people to help them gradually reminisce. We designed RemVerse, an AI-empowered VR prototype aimed to support reminiscence activities. Integrating generative models and AI agent into a VR environment, RemVerse helps older adults reminisce with AI-generated visual cues and interactive dialogues. Our user study with 14 older adults showed that RemVerse effectively supported reminiscence activities by triggering, concretizing, and deepening personal memories, while fostering increased engagement and autonomy among older adults. Based on our findings, we proposed design implications to make reminiscence activities in AI-assisted VR more accessible and engaging for older adults.
Authors:Zheng Ning, Leyang Li, Daniel Killough, JooYoung Seo, Patrick Carrington, Yapeng Tian, Yuhang Zhao, Franklin Mingzhe Li, Toby Jia-Jun Li
Abstract:
Videos offer rich audiovisual information that can support people in performing activities of daily living (ADLs), but they remain largely inaccessible to blind or low-vision (BLV) individuals. In cooking, BLV people often rely on non-visual cues, such as touch, taste, and smell, to navigate their environment, making it difficult to follow the predominantly audiovisual instructions found in video recipes. To address this problem, we introduce AROMA, an AI system that provides timely responses to the user based on real-time, context-aware assistance by integrating non-visual cues perceived by the user, a wearable camera feed, and video recipe content. AROMA uses a mixed-initiative approach: it responds to user requests while also proactively monitoring the video stream to offer timely alerts and guidance. This collaborative design leverages the complementary strengths of the user and AI system to align the physical environment with the video recipe, helping the user interpret their current cooking state and make sense of the steps. We evaluated AROMA through a study with eight BLV participants and offered insights for designing interactive AI systems to support BLV individuals in performing ADLs.
Authors:Ammar Ahmed, Ali Shariq Imran
Abstract:
This systematic literature review examines the role of large language models (LLMs) in UI/UX design, synthesizing findings from 38 peer-reviewed studies published between 2022 and 2025. We identify key LLMs in use, including GPT-4, Gemini, and PaLM, and map their integration across the design lifecycle, from ideation to evaluation. Common practices include prompt engineering, human-in-the-loop workflows, and multimodal input. While LLMs are reshaping design processes, challenges such as hallucination, prompt instability, and limited explainability persist. Our findings highlight LLMs as emerging collaborators in design, and we propose directions for the ethical, inclusive, and effective integration of these technologies.
Authors:Baiqiao Zhang, Xiangxian Li, Chao Zhou, Xinyu Gai, Juan Liu, Xue Yang, Xiaojuan Ma, Yong-jin Liu, Yulong Bian
Abstract:
The low-intrusion and automated personality assessment is receiving increasing attention in psychology and human-computer interaction fields. This study explores an interactive approach for personality assessment, focusing on the multiplicity of personality representation. We propose a framework of Gamified Personality Assessment through Multi-Personality Representations (Multi-PR GPA). The framework leverages Large Language Models to empower virtual agents with different personalities. These agents elicit multifaceted human personality representations through engaging in interactive games. Drawing upon the multi-type textual data generated throughout the interaction, it achieves two modes of personality assessment (i.e., Direct Assessment and Questionnaire-based Assessment) and provides interpretable insights. Grounded in the classic Big Five personality theory, we developed a prototype system and conducted a user study to evaluate the efficacy of Multi-PR GPA. The results affirm the effectiveness of our approach in personality assessment and demonstrate its superior performance when considering the multiplicity of personality representation.
Authors:Franklin Mingzhe Li, Akihiko Oharazawa, Chloe Qingyu Zhu, Misty Fan, Daisuke Sato, Chieko Asakawa, Patrick Carrington
Abstract:
Makeup plays a vital role in self-expression, identity, and confidence - yet remains an underexplored domain for assistive technology, especially for people with vision impairments. While existing tools support isolated tasks such as color identification or product labeling, they rarely address the procedural complexity of makeup routines: coordinating step sequences, managing product placement, and assessing the final look with accessible feedback. To understand the real-world process, we conducted a contextual inquiry with 15 visually impaired makeup users, capturing real-time makeup application behaviors and their step-by-step information needs and assessment approaches. Our findings reveal embodied, tactile-first strategies; persistent challenges in blending, symmetry, and assessment; and a desire for honest, real-time, goal-aligned feedback. We also interviewed five professional makeup artists, who reviewed participant makeup videos and provided expert responses to participant-raised questions and assessment practices. We contribute a taxonomy of feedback needs in non-visual makeup, and outline design implications for future assistive systems - emphasizing hands-free, conversational interaction and context-aware, procedural support for expressive and independent beauty practices.
Authors:Franklin Mingzhe Li, Kaitlyn Ng, Bin Zhu, Patrick Carrington
Abstract:
Cooking plays a vital role in everyday independence and well-being, yet remains challenging for people with vision impairments due to limited support for tracking progress and receiving contextual feedback. Object status - the condition or transformation of ingredients and tools - offers a promising but underexplored foundation for context-aware cooking support. In this paper, we present OSCAR (Object Status Context Awareness for Recipes), a technical pipeline that explores the use of object status recognition to enable recipe progress tracking in non-visual cooking. OSCAR integrates recipe parsing, object status extraction, visual alignment with cooking steps, and time-causal modeling to support real-time step tracking. We evaluate OSCAR on 173 instructional videos and a real-world dataset of 12 non-visual cooking sessions recorded by BLV individuals in their homes. Our results show that object status consistently improves step prediction accuracy across vision-language models, and reveal key factors that impact performance in real-world conditions, such as implicit tasks, camera placement, and lighting. We contribute the pipeline of context-aware recipe progress tracking, an annotated real-world non-visual cooking dataset, and design insights to guide future context-aware assistive cooking systems.
Authors:Xingyu Xiao, Hongxu Zhu, Jingang Liang, Jiejuan Tong, Haitao Wang
Abstract:
Human error remains a dominant risk driver in safety-critical sectors such as nuclear power, aviation, and healthcare, where seemingly minor mistakes can cascade into catastrophic outcomes. Although decades of research have produced a rich repertoire of mitigation techniques, persistent limitations: scarce high-quality data, algorithmic opacity, and residual reliance on expert judgment, continue to constrain progress. This review synthesizes recent advances at the intersection of risk-informed decision making, human reliability assessment (HRA), artificial intelligence (AI), and cognitive science to clarify how their convergence can curb human-error risk. We first categorize the principal forms of human error observed in complex sociotechnical environments and outline their quantitative impact on system reliability. Next, we examine risk-informed frameworks that embed HRA within probabilistic and data-driven methodologies, highlighting successes and gaps. We then survey cognitive and human-performance models, detailing how mechanistic accounts of perception, memory, and decision-making enrich error prediction and complement HRA metrics. Building on these foundations, we critically assess AI-enabled techniques for real-time error detection, operator-state estimation, and AI-augmented HRA workflows. Across these strands, a recurring insight emerges: integrating cognitive models with AI-based analytics inside risk-informed HRA pipelines markedly enhances predictive fidelity, yet doing so demands richer datasets, transparent algorithms, and rigorous validation. Finally, we identify promising research directions, coupling resilience engineering concepts with grounded theory, operationalizing the iceberg model of incident causation, and establishing cross-domain data consortia, to foster a multidisciplinary paradigm that elevates human reliability in high-stakes systems.
Authors:Kai Qin, Kexin Du, Yimeng Chen, Yueyan Liu, Jie Cai, Zhiqiang Nie, Nan Gao, Guohui Wei, Shengzhu Wang, Chun Yu
Abstract:
The integration of various AI tools creates a complex socio-technical environment where employee-customer interactions form the core of work practices. This study investigates how customer service representatives (CSRs) at the power grid service customer service call center perceive AI assistance in their interactions with customers. Through a field visit and semi-structured interviews with 13 CSRs, we found that AI can alleviate some traditional burdens during the call (e.g., typing and memorizing) but also introduces new burdens (e.g., earning, compliance, psychological burdens). This research contributes to a more nuanced understanding of AI integration in organizational settings and highlights the efforts and burdens undertaken by CSRs to adapt to the updated system.
Authors:Tanusree Sharma, Yu-Yun Tseng, Lotus Zhang, Ayae Ide, Kelly Avery Mack, Leah Findlater, Danna Gurari, Yang Wang
Abstract:
Blind and low vision (BLV) individuals use Generative AI (GenAI) tools to interpret and manage visual content in their daily lives. While such tools can enhance the accessibility of visual content and so enable greater user independence, they also introduce complex challenges around visual privacy. In this paper, we investigate the current practices and future design preferences of blind and low vision individuals through an interview study with 21 participants. Our findings reveal a range of current practices with GenAI that balance privacy, efficiency, and emotional agency, with users accounting for privacy risks across six key scenarios, such as self-presentation, indoor/outdoor spatial privacy, social sharing, and handling professional content. Our findings reveal design preferences, including on-device processing, zero-retention guarantees, sensitive content redaction, privacy-aware appearance indicators, and multimodal tactile mirrored interaction methods. We conclude with actionable design recommendations to support user-centered visual privacy through GenAI, expanding the notion of privacy and responsible handling of others data.
Authors:Yaman Yu, Mohi, Aishi Debroy, Xin Cao, Karen Rudolph, Yang Wang
Abstract:
AI companions are increasingly popular among teenagers, yet current platforms lack safeguards to address developmental risks and harmful normalization. Despite growing concerns, little is known about how parents and developmental psychology experts assess these interactions or what protections they consider necessary. We conducted 26 semi structured interviews with parents and experts, who reviewed real world youth GenAI companion conversation snippets. We found that stakeholders assessed risks contextually, attending to factors such as youth maturity, AI character age, and how AI characters modeled values and norms. We also identified distinct logics of assessment: parents flagged single events, such as a mention of suicide or flirtation, as high risk, whereas experts looked for patterns over time, such as repeated references to self harm or sustained dependence. Both groups proposed interventions, with parents favoring broader oversight and experts preferring cautious, crisis-only escalation paired with youth facing safeguards. These findings provide directions for embedding safety into AI companion design.
Authors:Mahika Phutane, Hayoung Jung, Matthew Kim, Tanushree Mitra, Aditya Vashistha
Abstract:
Large language models (LLMs) are increasingly under scrutiny for perpetuating identity-based discrimination in high-stakes domains such as hiring, particularly against people with disabilities (PwD). However, existing research remains largely Western-centric, overlooking how intersecting forms of marginalization--such as gender and caste--shape experiences of PwD in the Global South. We conduct a comprehensive audit of six LLMs across 2,820 hiring scenarios spanning diverse disability, gender, nationality, and caste profiles. To capture subtle intersectional harms and biases, we introduce ABLEIST (Ableism, Inspiration, Superhumanization, and Tokenism), a set of five ableism-specific and three intersectional harm metrics grounded in disability studies literature. Our results reveal significant increases in ABLEIST harms towards disabled candidates--harms that many state-of-the-art models failed to detect. These harms were further amplified by sharp increases in intersectional harms (e.g., Tokenism) for gender and caste-marginalized disabled candidates, highlighting critical blind spots in current safety tools and the need for intersectional safety evaluations of frontier models in high-stakes domains like hiring.
Authors:Pouya Shaeri, Ryan T. Woo, Yasaman Mohammadpour, Ariane Middel
Abstract:
Segmentation models achieve high accuracy on benchmarks but often fail in real-world domains by relying on spurious correlations instead of true object boundaries. We propose a human-in-the-loop interactive framework that enables interventional learning through targeted human corrections of segmentation outputs. Our approach treats human corrections as interventional signals that show when reliance on superficial features (e.g., color or texture) is inappropriate. The system learns from these interventions by propagating correction-informed edits across visually similar images, effectively steering the model toward robust, semantically meaningful features rather than dataset-specific artifacts. Unlike traditional annotation approaches that simply provide more training data, our method explicitly identifies when and why the model fails and then systematically corrects these failure modes across the entire dataset. Through iterative human feedback, the system develops increasingly robust representations that generalize better to novel domains and resist artifactual correlations. We demonstrate that our framework improves segmentation accuracy by up to 9 mIoU points (12-15\% relative improvement) on challenging cubemap data and yields 3-4$\times$ reductions in annotation effort compared to standard retraining, while maintaining competitive performance on benchmark datasets. This work provides a practical framework for researchers and practitioners seeking to build segmentation systems that are accurate, robust to dataset biases, data-efficient, and adaptable to real-world domains such as urban climate monitoring and autonomous driving.
Authors:Mohammed Almutairi, Charles Chiang, Haoze Guo, Matthew Belcher, Nandini Banerjee, Maria Milkowski, Svitlana Volkova, Daniel Nguyen, Tim Weninger, Michael Yankoski, Trenton W. Ford, Diego Gomez-Zara
Abstract:
Enabling users to create their own simulations offers a powerful way to study team dynamics and performance. We introduce VirTLab, a system that allows researchers and practitioners to design interactive, customizable simulations of team dynamics with LLM-based agents situated in 2D spatial environments. Unlike prior frameworks that restrict scenarios to predefined or static tasks, our approach enables users to build scenarios, assign roles, and observe how agents coordinate, move, and adapt over time. By bridging team cognition behaviors with scalable agent-based modeling, our system provides a testbed for investigating how environments influence coordination, collaboration, and emergent team behaviors. We demonstrate its utility by aligning simulated outcomes with empirical evaluations and a user study, underscoring the importance of customizable environments for advancing research on multi-agent simulations. This work contributes to making simulations accessible to both technical and non-technical users, supporting the design, execution, and analysis of complex multi-agent experiments.
Authors:Shunpei Norihama, Yuka Iwane, Jo Takezawa, Simo Hosio, Mari Hirano, Naomi Yamashita, Koji Yatani
Abstract:
Writing about personal experiences can improve well-being, but for family caregivers, fixed or user-initiated schedules often miss the right moments. Drawing on Construal Level Theory, we conducted a three-week field study with 47 caregivers using a chatbot that delivered daily reflective writing prompts and captured temporal, spatial, and social contexts. We collected 958 writing entries, resulting in 5,412 coded segments. Our Analysis revealed two reflective modes. Under proximal conditions, participants produced detailed, emotion-rich, and care recipient-focused narratives that supported emotional release. Under distal conditions, they generated calmer, self-focused, and analytic accounts that enabled objective reflection and cognitive reappraisal. Participants described trade-offs: proximity preserved vivid detail but limited objectivity, while distance enabled analysis but risked memory loss. This work contributes empirical evidence of how psychological distances shape reflective writing and proposes design implications for distance-aware Just-in-Time Adaptive Interventions for family caregivers' mental health support.
Authors:Bhavya Matam, Adamay Mann, Kachina Studer, Christian Gabbianelli, Sonia Castelo, John Liu, Claudio Silva, Dishita Turakhia
Abstract:
With the growing need to effectively support workforce upskilling in the manufacturing sector, virtual reality is gaining popularity as a scalable training solution. However, most current systems are designed as static, step-by-step tutorials and do not adapt to a learner's needs or cognitive load, which is a critical factor in learning and longterm retention. We address this limitation with CLAd-VR, an adaptive VR training system that integrates realtime EEG-based sensing to measure the learner's cognitive load and adapt instruction accordingly, specifically for domain-specific tasks in manufacturing. The system features a VR training module for a precision drilling task, designed with multimodal instructional elements including animations, text, and video. Our cognitive load sensing pipeline uses a wearable EEG device to capture the trainee's neural activity, which is processed through an LSTM model to classify their cognitive load as low, optimal, or high in real time. Based on these classifications, the system dynamically adjusts task difficulty and delivers adaptive guidance using voice guidance, visual cues, or ghost hand animations. This paper introduces CLAd-VR system's architecture, including the EEG sensing hardware, real-time inference model, and adaptive VR interface.
Authors:Seth Bernstein, Ashfin Rahman, Nadia Sharifi, Ariunjargal Terbish, Stephen MacNeil
Abstract:
Generative artificial intelligence (GenAI) has already had a big impact on computing education with prior research identifying many benefits. However, recent studies have also identified potential risks and harms. To continue maximizing AI benefits while addressing the harms and unintended consequences, we conducted a systematic literature review of research focusing on the risks, harms, and unintended consequences of GenAI in computing education. Our search of ACM DL, IEEE Xplore, and Scopus (2022-2025) resulted in 1,677 papers, which were then filtered to 224 based on our inclusion and exclusion criteria. Guided by best practices for systematic reviews, four reviewers independently extracted publication year, learner population, research method, contribution type, GenAI technology, and educational task information from each paper. We then coded each paper for concrete harm categories such as academic integrity, cognitive effects, and trust issues. Our analysis shows patterns in how and where harms appear, highlights methodological gaps and opportunities for more rigorous evidence, and identifies under-explored harms and student populations. By synthesizing these insights, we intend to equip educators, computing students, researchers, and developers with a clear picture of the harms associated with GenAI in computing education.
Authors:Yingjing Xiao, Zhichao Huang, Junbin Ren, Haichuan Song, Yang Gao, Yuting Bai, Zhanpeng Jin
Abstract:
Hand pose tracking is essential for advancing applications in human-computer interaction. Current approaches, such as vision-based systems and wearable devices, face limitations in portability, usability, and practicality. We present a novel wearable system that reconstructs 3D hand pose and estimates per-finger forces using a minimal ring-watch sensor setup. A ring worn on the finger integrates an inertial measurement unit (IMU) to capture finger motion, while a smartwatch-based single-channel electromyography (EMG) sensor on the wrist detects muscle activations. By leveraging the complementary strengths of motion sensing and muscle signals, our approach achieves accurate hand pose tracking and grip force estimation in a compact wearable form factor. We develop a dual-branch transformer network that fuses IMU and EMG data with cross-modal attention to predict finger joint positions and forces simultaneously. A custom loss function imposes kinematic constraints for smooth force variation and realistic force saturation. Evaluation with 20 participants performing daily object interaction gestures demonstrates an average Mean Per Joint Position Error (MPJPE) of 0.57 cm and a fingertip force estimation (RMSE: 0.213, r=0.76). We showcase our system in a real-time Unity application, enabling virtual hand interactions that respond to user-applied forces. This minimal, force-aware tracking system has broad implications for VR/AR, assistive prosthetics, and ergonomic monitoring.
Authors:Sophia Ppali, Haris Psallidopoulos, Marios Constantinides, Fotis Liarokapis
Abstract:
Virtual Reality (VR) is increasingly being used to support workplace well-being, but many interventions focus narrowly on a single activity or goal. Our work explores how VR can meet the diverse physical and mental needs of knowledge workers. We developed Tranquil Loom, a VR app offering stretching, guided meditation, and open exploration across four environments. The app includes an AI assistant that suggests activities based on users' emotional states. We conducted a two-phase mixed-methods study: (1) interviews with 10 knowledge workers to guide the app's design, and (2) deployment with 35 participants gathering usage data, well-being measures, and interviews. Results showed increases in mindfulness and reductions in anxiety. Participants enjoyed both structured and open-ended activities, often using the app playfully. While AI suggestions were used infrequently, they prompted ideas for future personalization. Overall, participants viewed VR as a flexible, ``drop-in'' tool, highlighting its value for situational rather than prescriptive well-being support.
Authors:Yaozheng Xia, Zaiping Zhu, Bo Pang, Shaorong Wang, Sheng Li
Abstract:
Gaze stabilization is critical for enabling fluid, accurate, and efficient interaction in immersive augmented reality (AR) environments, particularly during task-oriented visual behaviors. However, fixation sequences captured in active gaze tasks often exhibit irregular dispersion and systematic deviations from target locations, a variability primarily caused by the combined effects of human oculomotor physiology, insufficient AR headset tracking and calibration accuracy, and environmental disturbances, undermining interaction performance and visual engagement. To address this issue, we propose TimeGazer, which reformulates gaze stabilization as a sequence-to-sequence temporal regression problem, predicting idealized fixation trajectories for the target-fixation phase from historical gaze dynamics in the search phase. We present a synthetic data generation and blending strategy that produces spatially concentrated, target-centered fixation references aligned with task objectives, substantially enriching the training space and enhancing model generalization. We train and evaluate TimeGazer on a hybrid dataset of real and augmented gaze sequences collected via Microsoft HoloLens 2 from 54 participants across multiple prediction horizons. Through the user study, statistical results demonstrate that TimeGazer significantly improves interaction accuracy and reduces completion time, confirming that temporal modeling of predictive gaze stabilization can strengthen attentional consistency and responsiveness in task-driven AR interaction. These findings highlight the broader potential of TimeGazer for advancing adaptive gaze-based interfaces and temporal modeling research in immersive systems.
Authors:Matthew Varona, Maryam Hedayati, Matthew Kay, Carolina Nobre
Abstract:
"Theory figures" are a staple of theoretical visualization research. Common shapes such as Cartesian planes and flowcharts can be used not only to explain conceptual contributions, but to think through and refine the contribution itself. Yet, theory figures tend to be limited to a set of standard shapes, limiting the creative and expressive potential of visualization theory. In this work, we explore how the shapes used in theory figures afford different understandings and explanations of their underlying phenomena. We speculate on the value of visualizing theories using more expressive configurations, such as icebergs, horseshoes, Möbius strips, and BLT sandwiches. By reflecting on figure-making's generative role in the practice of theorizing, we conclude that theory is, in fact, shapes.
Authors:Yucheng Lu, Hubert Dariusz Zając, Veronika Cheplygina, Amelia Jiménez-Sánchez
Abstract:
Transfer learning is crucial for medical imaging, yet the selection of source datasets - which can impact the generalizability of algorithms, and thus patient outcomes - often relies on researchers' intuition rather than systematic principles. This study investigates these decisions through a task-based survey with machine learning practitioners. Unlike prior work that benchmarks models and experimental setups, we take a human-centered HCI perspective on how practitioners select source datasets. Our findings indicate that choices are task-dependent and influenced by community practices, dataset properties, and computational (data embedding), or perceived visual or semantic similarity. However, similarity ratings and expected performance are not always aligned, challenging a traditional "more similar is better" view. Participants often used ambiguous terminology, which suggests a need for clearer definitions and HCI tools to make them explicit and usable. By clarifying these heuristics, this work provides practical insights for more systematic source selection in transfer learning.
Authors:Helen Schneider, Svetlana Pavlitska, Helen Gremmelmaier, J. Marius Zöllner
Abstract:
Understanding human affect can be used in robotics, marketing, education, human-computer interaction, healthcare, entertainment, autonomous driving, and psychology to enhance decision-making, personalize experiences, and improve emotional well-being. This work presents a comprehensive overview of affect inference datasets that utilize continuous valence and arousal labels. We reviewed 25 datasets published between 2008 and 2024, examining key factors such as dataset size, subject distribution, sensor configurations, annotation scales, and data formats for valence and arousal values. While camera-based datasets dominate the field, we also identified several widely used multimodal combinations. Additionally, we explored the most common approaches to affect detection applied to these datasets, providing insights into the prevailing methodologies in the field. Our overview of sensor fusion approaches shows promising advancements in model improvement for valence and arousal inference.
Authors:Jade Kandel, Sriya Kasumarthi, Spiros Tsalikis, Chelsea Duppen, Daniel Szafir, Michael Lewek, Henry Fuchs, Danielle Szafir
Abstract:
Augmented reality (AR) offers promising opportunities to support movement-based activities, such as personal training or physical therapy, with real-time, spatially-situated visual cues. While many approaches leverage AR to guide motion, existing design guidelines focus on simple, upper-body movements within the user's field of view. We lack evidence-based design recommendations for guiding more diverse scenarios involving movements with varying levels of visibility and direction. We conducted an experiment to investigate how different visual encodings and perspectives affect motion guidance performance and usability, using three exercises that varied in visibility and planes of motion. Our findings reveal significant differences in preference and performance across designs. Notably, the best perspective varied depending on motion visibility and showing more information about the overall motion did not necessarily improve motion execution. We provide empirically-grounded guidelines for designing immersive, interactive visualizations for motion guidance to support more effective AR systems.
Authors:Robert Kasumba, Zeyu Lu, Dom CP Marticorena, Mingyang Zhong, Paul Beggs, Anja Pahor, Geetha Ramani, Imani Goffney, Susanne M Jaeggi, Aaron R Seitz, Jacob R Gardner, Dennis L Barbour
Abstract:
This study uses controlled simulations with known ground-truth parameters to evaluate how Distributional Latent Variable Models (DLVM) and Bayesian Distributional Active LEarning (DALE) perform in comparison to conventional Independent Maximum Likelihood Estimation (IMLE). DLVM integrates observations across multiple executive function tasks and individuals, allowing parameter estimation even under sparse or incomplete data conditions. DLVM consistently outperformed IMLE, especially under with smaller amounts of data, and converges faster to highly accurate estimates of the true distributions. In a second set of analyses, DALE adaptively guided sampling to maximize information gain, outperforming random sampling and fixed test batteries, particularly within the first 80 trials. These findings establish the advantages of combining DLVM's cross-task inference with DALE's optimal adaptive sampling, providing a principled basis for more efficient cognitive assessments.
Authors:Dom CP Marticorena, Chris Wissmann, Zeyu Lu, Dennis L Barbour
Abstract:
While adaptive experimental design has outgrown one-dimensional, staircase-based adaptations, most cognitive experiments still control a single factor and summarize performance with a scalar. We show a validation of a Bayesian, two-axis, active-classification approach, carried out in an immersive virtual testing environment for a 5-by-5 working-memory reconstruction task. Two variables are controlled: spatial load L (number of occupied tiles) and feature-binding load K (number of distinct colors) of items. Stimulus acquisition is guided by posterior uncertainty of a nonparametric Gaussian Process (GP) probabilistic classifier, which outputs a surface over (L, K) rather than a single threshold or max span value. In a young adult population, we compare GP-driven Adaptive Mode (AM) with a traditional adaptive staircase Classic Mode (CM), which varies L only at K = 3. Parity between the methods is achieved for this cohort, with an intraclass coefficient of 0.755 at K = 3. Additionally, AM reveals individual differences in interactions between spatial load and feature binding. AM estimates converge more quickly than other sampling strategies, demonstrating that only about 30 samples are required for accurate fitting of the full model.
Authors:Kyuha Jung, Tyler Kim, Yunan Chen
Abstract:
As teenagers increasingly turn to social media for health-related information, understanding the values of teen-targeted content has become important. Although videos on healthy lifestyles and self-improvement are gaining popularity on social media platforms like YouTube, little is known about how these videos benefit and engage with teenage viewers. To address this, we conducted a thematic analysis of 44 YouTube videos and 66,901 comments. We found that these videos provide various advice on teenagers' common challenges, use engaging narratives for authenticity, and foster teen-centered communities through comments. However, a few videos also gave misleading advice to adolescents that can be potentially harmful. Based on our findings, we discuss design implications for creating relatable and intriguing social media content for adolescents. Additionally, we suggest ways for social media platforms to promote healthier and safer experiences for teenagers.
Authors:Chengxin Zhang, Huizhong Guo, Zifei Wang, Fred Feng, Anuj Pradhan, Shan Bao
Abstract:
This study investigates how targeted training interventions can improve safe driver interaction with vehicle automation (VA) systems, focusing on Adaptive Cruise Control (ACC) and Lane Keeping Assist (LKA), both safety-critical advanced driver assistance systems (ADAS). Effective training reduces misuse and enhances road safety by promoting correct knowledge and application. A review of multiple automakers' owners' manuals revealed inconsistencies in describing ACC and LKA functions. Three training formats were compared: (1) owners' manual (OM), (2) knowledge-based (KB) with summarized operational guidelines and visual aids, and (3) skill-based hands-on practice in a driving simulator (SIM). Thirty-six participants with no prior VA experience were randomly assigned to one group. Safety-relevant outcomes - system comprehension (quiz scores) and real-world engagement (frequency and duration of activations) - were analyzed using mixed-effects and negative binomial models. KB training produced the greatest improvements in comprehension of system limitations, as well as safer engagement patterns. Compared with OM participants, KB participants achieved significantly higher quiz scores and engaged LKA and ACC more often (1.4 and 1.45 times, respectively); they also demonstrated greater awareness of scenarios requiring manual control, indicating reduced risk of inappropriate reliance. Older drivers exhibited longer activations overall, highlighting age-related differences in reliance and potential safety implications. Short, targeted training can significantly improve safe and effective VA system use, particularly for senior drivers. These results highlight training as a proactive safety intervention to reduce human-automation mismatch and enhance system reliability in real-world driving.
Authors:Jiexi Xu, Jiaqi Liu, Lanruo Wang, Su Liu
Abstract:
Large language model (LLM) agents are increasingly capable of orchestrating complex tasks in low-code environments. However, these agents often exhibit hallucinations and logical inconsistencies because their inherent reasoning mechanisms rely on probabilistic associations rather than genuine causal understanding. This paper introduces a new programming paradigm: Causal-Visual Programming (CVP), designed to address this fundamental issue by explicitly introducing causal structures into the workflow design. CVP allows users to define a simple "world model" for workflow modules through an intuitive low-code interface, effectively creating a Directed Acyclic Graph (DAG) that explicitly defines the causal relationships between modules. This causal graph acts as a crucial constraint during the agent's reasoning process, anchoring its decisions to a user-defined causal structure and significantly reducing logical errors and hallucinations by preventing reliance on spurious correlations. To validate the effectiveness of CVP, we designed a synthetic experiment that simulates a common real-world problem: a distribution shift between the training and test environments. Our results show that a causally anchored model maintained stable accuracy in the face of this shift, whereas a purely associative baseline model that relied on probabilistic correlations experienced a significant performance drop. The primary contributions of this study are: a formal definition of causal structures for workflow modules; the proposal and implementation of a CVP framework that anchors agent reasoning to a user-defined causal graph; and empirical evidence demonstrating the framework's effectiveness in enhancing agent robustness and reducing errors caused by causal confusion in dynamic environments. CVP offers a viable path toward building more interpretable, reliable, and trustworthy AI agents.
Authors:Momin N. Siddiqui, Nikki Nasseri, Adam Coscia, Roy Pea, Hari Subramonyam
Abstract:
As generative AI becomes part of everyday writing, questions of transparency and productive human effort are increasingly important. Educators, reviewers, and readers want to understand how AI shaped the process. Where was human effort focused? What role did AI play in the creation of the work? How did the interaction unfold? Existing approaches often reduce these dynamics to summary metrics or simplified provenance. We introduce DraftMarks, an augmented reading tool that surfaces the human-AI writing process through familiar physical metaphors. DraftMarks employs skeuomorphic encodings such as eraser crumbs to convey the intensity of revision, and masking tape or smudges to mark AI-generated content, simulating the process within the final written artifact. By using data from writer-AI interactions, DraftMarks' algorithm computes various collaboration metrics and writing traces. Through a formative study, we identified computational logic for different readership, and evaluated DraftMarks for its effectiveness in assessing AI co-authored writing.
Authors:Karthik S. Bhat, Jiayue Melissa Shi, Wenxuan Song, Dong Whi Yoo, Koustuv Saha
Abstract:
Screen use pervades daily life, shaping work, leisure, and social connections while raising concerns for digital wellbeing. Yet, reducing screen time alone risks oversimplifying technology's role and neglecting its potential for meaningful engagement. We posit self-awareness -- reflecting on one's digital behavior -- as a critical pathway to digital wellbeing. We developed WellScreen, a lightweight probe that scaffolds daily reflection by asking people to estimate and report smartphone use. In a two-week deployment (N=25), we examined how discrepancies between estimated and actual usage shaped digital awareness and wellbeing. Participants often underestimated productivity and social media while overestimating entertainment app use. They showed a 10% improvement in positive affect, rating WellScreen as moderately useful. Interviews revealed that structured reflection supported recognition of patterns, adjustment of expectations, and more intentional engagement with technology. Our findings highlight the promise of lightweight reflective interventions for supporting self-awareness and intentional digital engagement, offering implications for designing digital wellbeing tools.
Authors:Alan Boyle, Furui Cheng, Vilém Zouhar, Mennatallah El-Assady
Abstract:
Feature attribution methods, such as SHAP and LIME, explain machine learning model predictions by quantifying the influence of each input component. When applying feature attributions to explain language models, a basic question is defining the interpretable components. Traditional feature attribution methods, commonly treat individual words as atomic units. This is highly computationally inefficient for long-form text and fails to capture semantic information that spans multiple words. To address this, we present CafGa, an interactive tool for generating and evaluating feature attribution explanations at customizable granularities. CafGa supports customized segmentation with user interaction and visualizes the deletion and insertion curves for explanation assessments. Through a user study involving participants of various expertise, we confirm CafGa's usefulness, particularly among LLM practitioners. Explanations created using CafGa were also perceived as more useful compared to those generated by two fully automatic baseline methods: PartitionSHAP and MExGen, suggesting the effectiveness of the system.
Authors:Cassandra Overney, Hang Jiang, Urooj Haider, Cassandra Moe, Jasmine Mangat, Frank Pantano, Effie G. McMillian, Paul Riggins, Nabeel Gillani
Abstract:
Community engagement processes in representative political contexts, like school districts, generate massive volumes of feedback that overwhelm traditional synthesis methods, creating barriers to shared understanding not only between civic leaders and constituents but also among community members. To address these barriers, we developed StoryBuilder, a human-AI collaborative pipeline that transforms community input into accessible first-person narratives. Using 2,480 community responses from an ongoing school rezoning process, we generated 124 composite stories and deployed them through a mobile-friendly StorySharer interface. Our mixed-methods evaluation combined a four-month field deployment, user studies with 21 community members, and a controlled experiment examining how narrative composition affects participant reactions. Field results demonstrate that narratives helped community members relate across diverse perspectives. In the experiment, experience-grounded narratives generated greater respect and trust than opinion-heavy narratives. We contribute a human-AI narrative synthesis system and insights on its varied acceptance and effectiveness in a real-world civic context.
Authors:Aziz N Zeidieh, Sanchita S. Kamath, JooYoung Seo
Abstract:
Effective time management during presentations is challenging, particularly for Blind and Low-Vision (BLV) individuals, as existing tools often lack accessibility and multimodal feedback. To address this gap, we developed vashTimer: a free, open-source, and accessible iOS application. This paper demonstrates the design and functionality of vashTimer, which provides presenters with a robust tool for temporal awareness. The application delivers highly customizable alerts across four distinct modalities: visual, auditory, speech, and haptic; and supports multiple configurable intervals within a single session. By offering a flexible and non-intrusive time management solution, vashTimer empowers presenters of all visual abilities. The implications of this work extend beyond public speaking to any professional, such as a clinical therapist, who requires discreet temporal cues, fostering greater independence and focus for a wide range of users. This demonstration serves as the foundation for a planned formal user evaluation.
Authors:Courtney Heldreth, Laura M. Vardoulakis, Nicole E. Miller, Yael Haramaty, Diana Akrong, Lidan Hackmon, Lior Belinsky
Abstract:
Generative AI, which is capable of transforming static content into dynamic learning experiences, holds the potential to revolutionize student engagement in educational contexts. However, questions still remain around whether or not these tools are effective at facilitating student learning. In this research, we test the effectiveness of content transformations through Learn Your Way, an experimental research platform that transforms textbook chapters into dynamic visual and audio representations. Through a between-subjects, mixed methods experiment with 60 US-based students, we demonstrate that students who used Learn Your Way had a more positive learning experience and had better learning outcomes compared to students learning the same content via a digital reader. These findings indicate that AI-driven tools, capable of providing multiple, interactive representations of content, constitute an effective and promising method for enhancing student learning.
Authors:Jackie Chan, Fred Choi, Koustuv Saha, Eshwar Chandrasekharan
Abstract:
Social media feeds have become central to the Internet. Among the most visible are trending feeds, which rank content deemed timely and relevant. To examine how feed signals influence behaviors and perceptions, we conducted a randomized experiment (n = 585) simulating Reddit's r/popular feed. By having participants view identical sets of posts in different orders, we isolate the effects of rank and social proof on engagement and perceived relevance, trustworthiness, and quality. We found that lower-ranked posts received about 40% less engagement, despite participants rarely reporting rank as a factor in their choices. In contrast, neither rank nor social proof shifted perceptions across the three dimensions. We also observed demographic patterns: older participants were more skeptical of trending content, while those with less formal education expressed greater trust. Overall, our findings show that algorithmic curation implicitly steers attention, with implications for platform design, research on algorithmic influence, and policy.
Authors:Gaurav Seth, Hoa Pham, Giles Hamilton-Fletcher, Charles Leclercq, John-Ross Rizzo
Abstract:
Visual assistive technologies, such as Microsoft Seeing AI, can improve access to environmental information for persons with blindness or low vision (pBLV). Yet, the physical and functional implications of different device embodiments remain unclear. In this study, 11 pBLV participants used Seeing AI on a hand-held smartphone and on a head-mounted ARx Vision system to perform six activities of daily living, while their movements were captured with Xsens motion capture. Functional outcomes included task time, success rate, and number of attempts, and biomechanical measures included joint range of motion, angular path length, working volume, and movement smoothness. The head-mounted system generally reduced upper-body movement and task time, especially for document-scanning style tasks, whereas the hand-held system yielded higher success rates for tasks involving small or curved text. These findings indicate that both embodiments are viable, but they differ in terms of physical demands and ease of use. Incorporating biomechanical measures into assistive technology evaluations can inform designs that optimise user experience by balancing functional efficiency, physical sustainability, and intuitive interaction.
Authors:Jade Kandel, Jiayi Liu, Arran Zeyu Wang, Chin Tseng, Danielle Szafir
Abstract:
Visualizations support critical decision making in domains like health risk communication. This is particularly important for those at higher health risks and their care providers, allowing for better risk interpretation which may lead to more informed decisions. However, the kinds of visualizations used to represent data may impart biases that influence data interpretation and decision making. Both continuous representations using bar charts and discrete representations using icon arrays are pervasive in health risk communication, but express the same quantities using fundamentally different visual paradigms. We conducted a series of studies to investigate how bar charts, icon arrays, and their layout (juxtaposed, explicit encoding, explicit encoding plus juxtaposition) affect the perception of value comparison and subsequent decision-making in health risk communication. Our results suggest that icon arrays and explicit encoding combined with juxtaposition can optimize for both accurate difference estimation and perceptual biases in decision making. We also found misalignment between estimation accuracy and decision making, as well as between low and high literacy groups, emphasizing the importance of tailoring visualization approaches to specific audiences and evaluating visualizations beyond perceptual accuracy alone. This research contributes empirically-grounded design recommendations to improve comparison in health risk communication and support more informed decision-making across domains.
Authors:Connor Lawless, Jakob Schoeffer, Madeleine Udell
Abstract:
Optimization underpins decision-making in domains from healthcare to logistics, yet for many practitioners it remains a "magical box": powerful but opaque, difficult to use, and reliant on specialized expertise. While prior work has extensively studied machine learning workflows, the everyday practices of optimization model developers (OMDs) have received little attention. We conducted semi-structured interviews with 15 OMDs across diverse domains to examine how optimization is done in practice. Our findings reveal a highly iterative workflow spanning six stages: problem elicitation, data processing, model development, implementation, validation, and deployment. Importantly, we find that optimization practice is not only about algorithms that deliver better decisions, but is equally shaped by data and dialogue - the ongoing communication with stakeholders that enables problem framing, trust, and adoption. We discuss opportunities for future tooling that foregrounds data and dialogue alongside decision-making, opening new directions for human-centered optimization.
Authors:Xinchen Wan, Jinhua Liang, Huan Zhang
Abstract:
Existing digital mental wellness tools often overlook the nuanced emotional states underlying everyday challenges. For example, pre-sleep anxiety affects more than 1.5 billion people worldwide, yet current approaches remain largely static and "one-size-fits-all", failing to adapt to individual needs. In this work, we present EmoHeal, an end-to-end system that delivers personalized, three-stage supportive narratives. EmoHeal detects 27 fine-grained emotions from user text with a fine-tuned XLM-RoBERTa model, mapping them to musical parameters via a knowledge graph grounded in music therapy principles (GEMS, iso-principle). EmoHeal retrieves audiovisual content using the CLAMP3 model to guide users from their current state toward a calmer one ("match-guide-target"). A within-subjects study (N=40) demonstrated significant supportive effects, with participants reporting substantial mood improvement (M=4.12, p<0.001) and high perceived emotion recognition accuracy (M=4.05, p<0.001). A strong correlation between perceived accuracy and therapeutic outcome (r=0.72, p<0.001) validates our fine-grained approach. These findings establish the viability of theory-driven, emotion-aware digital wellness tools and provides a scalable AI blueprint for operationalizing music therapy principles.
Authors:Ryuhaerang Choi, Taehan Kim, Subin Park, Seohyeon Yoo, Jennifer G. Kim, Sung-Ju Lee
Abstract:
Peer recovery narratives provide unique benefits beyond professional or lay mentoring by fostering hope and sustained recovery in eating disorder (ED) contexts. Yet, such support is limited by the scarcity of peer-involved programs and potential drawbacks on recovered peers, including relapse risk. To address this, we designed RecoveryTeller, a chatbot adopting a recovered-peer persona that portrays itself as someone recovered from an ED. We examined whether such a persona can reproduce the support affordances of peer recovery narratives. We compared RecoveryTeller with a lay-mentor persona chatbot offering similar guidance but without a recovery background. We conducted a 20-day cross-over deployment study with 26 ED participants, each using both chatbots for 10 days. RecoveryTeller elicited stronger emotional resonance than a lay-mentor chatbot, yet tensions between emotional and epistemic trust led participants to view the two personas as complementary rather than substitutes. We provide design implications for mental health chatbot persona design.
Authors:Huanchen Wang, Wencheng Zhang, Zhiqiang Wang, Zhicong Lu, Yuxin Ma
Abstract:
Vision-language (VL) models have shown transformative potential across various critical domains due to their capability to comprehend multi-modal information. However, their performance frequently degrades under distribution shifts, making it crucial to assess and improve robustness against real-world data corruption encountered in practical applications. While advancements in VL benchmark datasets and data augmentation (DA) have contributed to robustness evaluation and improvement, there remain challenges due to a lack of in-depth comprehension of model behavior as well as the need for expertise and iterative efforts to explore data patterns. Given the achievement of visualization in explaining complex models and exploring large-scale data, understanding the impact of various data corruption on VL models aligns naturally with a visual analytics approach. To address these challenges, we introduce VisMoDAl, a visual analytics framework designed to evaluate VL model robustness against various corruption types and identify underperformed samples to guide the development of effective DA strategies. Grounded in the literature review and expert discussions, VisMoDAl supports multi-level analysis, ranging from examining performance under specific corruptions to task-driven inspection of model behavior and corresponding data slice. Unlike conventional works, VisMoDAl enables users to reason about the effects of corruption on VL models, facilitating both model behavior understanding and DA strategy formulation. The utility of our system is demonstrated through case studies and quantitative evaluations focused on corruption robustness in the image captioning task.
Authors:Emily Sumner, Deepak E. Gopinath, Laporsha Dees, Patricio Reyes Gomez, Xiongyi Cui, Andrew Silva, Jean Costa, Allison Morgan, Mariah Schrum, Tiffany L. Chen, Avinash Balachandran, Guy Rosman
Abstract:
Curated datasets are essential for training and evaluating AI approaches, but are often lacking in domains where language and physical action are deeply intertwined. In particular, few datasets capture how people acquire embodied skills through verbal instruction over time. To address this gap, we introduce SimCoachCorpus: a unique dataset of race car simulator driving that allows for the investigation of rich interactive phenomena during guided and unguided motor skill acquisition. In this dataset, 29 humans were asked to drive in a simulator around a race track for approximately ninety minutes. Fifteen participants were given personalized one-on-one instruction from a professional performance driving coach, and 14 participants drove without coaching. \name\ includes embodied features such as vehicle state and inputs, map (track boundaries and raceline), and cone landmarks. These are synchronized with concurrent verbal coaching from a professional coach and additional feedback at the end of each lap. We further provide annotations of coaching categories for each concurrent feedback utterance, ratings on students' compliance with coaching advice, and self-reported cognitive load and emotional state of participants (gathered from surveys during the study). The dataset includes over 20,000 concurrent feedback utterances, over 400 terminal feedback utterances, and over 40 hours of vehicle driving data. Our naturalistic dataset can be used for investigating motor learning dynamics, exploring linguistic phenomena, and training computational models of teaching. We demonstrate applications of this dataset for in-context learning, imitation learning, and topic modeling. The dataset introduced in this work will be released publicly upon publication of the peer-reviewed version of this paper. Researchers interested in early access may register at https://tinyurl.com/SimCoachCorpusForm.
Authors:Jade Kandel, Sriya Kasumarthi, Danielle Szafir
Abstract:
Physical therapy (PT) is crucial in helping older adults manage chronic conditions and weakening muscles, but older adults face increasing challenges that can impact their PT experience, including increased fatigue, memory loss, and mobility and travel constraints. While current technology attempts to facilitate remote care, they have limitations and are used in-practice infrequently. Mixed reality (MR) technology shows promise for addressing these challenges by creating immersive, context-aware environments remotely that previously could only be achieved in clinical settings. To bridge the gap between MR's potential and its practical application in geriatric PT, we conducted in-depth interviews with three PT clinicians and six older adult patients to understand challenges with PT care and adherence that MR may address. Our findings inform design considerations for supporting older adults' needs through MR and outline technical requirements for practical implementation.
Authors:Sanchita S. Kamath, Omar Khan, Aziz N Zeidieh, JooYoung Seo
Abstract:
Statistical concepts often rely heavily on visual cues for comprehension, presenting challenges for individuals who face difficulties using visual information, such as the blind and low-vision (BLV) community. While prior work has explored making data visualizations accessible, limited research examines how BLV individuals conceptualize and learn the underlying statistical concepts these visualizations represent. To better understand BLV individuals' learning strategies for potentially unfamiliar statistical concepts, we conducted a within-subjects experiment with 7 BLV individuals, controlling for vision condition using blindfolds. Each participant leveraged three different non-visual representations (Swell Touch tactile graph (STGs), shaped data patterns on a refreshable display (BDPs), sonification) to understand three different statistical concepts in histograms (skewness, modality, kurtosis). We collected quantitative metrics (accuracy, completion time, self-reported confidence levels) and qualitative insights (gesture analysis) to identify participants' unique meaning-making strategies. Results revealed that the braille condition led to the most accurate results, with sonification tasks being completed the fastest. Participants demonstrated various adaptive techniques when exploring each histogram, often developing alternative mental models that helped them non-visually encode statistical visualization concepts. Our findings reveal important implications for statistics educators and assistive technology designers, suggesting that effective learning tools must go beyond simple translation of visual information to support the unique cognitive strategies employed by BLV learners.
Authors:Farnaz Jahanbakhsh, Dora Zhao, Tiziano Piccardi, Zachary Robertson, Ziv Epstein, Sanmi Koyejo, Michael S. Bernstein
Abstract:
While social media feed rankings are primarily driven by engagement signals rather than any explicit value system, the resulting algorithmic feeds are not value-neutral: engagement may prioritize specific individualistic values. This paper presents an approach for social media feed value alignment. We adopt Schwartz's theory of Basic Human Values -- a broad set of human values that articulates complementary and opposing values forming the building blocks of many cultures -- and we implement an algorithmic approach that models and then ranks feeds by expressions of Schwartz's values in social media posts. Our approach enables controls where users can express weights on their desired values, combining these weights and post value expressions into a ranking that respects users' articulated trade-offs. Through controlled experiments (N=141 and N=250), we demonstrate that users can use these controls to architect feeds reflecting their desired values. Across users, value-ranked feeds align with personal values, diverging substantially from existing engagement-driven feeds.
Authors:Jiahui An, Bingjie Cheng, Dmitriy Rudyka, Elisa Donati, Sara Fabrikant
Abstract:
Brain computer interfaces enable real-time monitoring of cognitive load, but their effectiveness in dynamic navigation contexts is not well established. Using an existing VR navigation dataset, we examined whether EEG signals can classify cognitive load during map-based wayfinding and whether classification accuracy depends more on task complexity or on individual traits. EEG recordings from forty-six participants navigating routes with 3, 5, or 7 map landmarks were analyzed with a nested cross-validation framework across multiple machine learning models. Classification achieved mean accuracies up to 90.8% for binary contrasts (3 vs. 7 landmarks) and 78.7% for the three-class problem, both well above chance. Demographic and cognitive variables (age, gender, spatial ability, working memory) showed no significant influence. These findings demonstrate that task demands outweigh individual differences in shaping classification performance, highlighting the potential for task-adaptive navigation systems that dynamically adjust map complexity in response to real-time cognitive states.
Authors:Hongjin Lin, Anna Kawakami, Catherine D'Ignazio, Kenneth Holstein, Krzysztof Gajos
Abstract:
Artificial Intelligence for Social Good (AI4SG) is a growing area exploring AI's potential to address social issues like public health. Yet prior work has shown limited evidence of its tangible benefits for intended communities, and projects frequently face inadequate community engagement and sustainability challenges. Funding agendas play a crucial role in framing AI4SG initiatives and shaping their approaches. Through a qualitative analysis of 35 funding documents -- representing about $410 million USD in total investments, we reveal dissonances between AI4SG's stated intentions for positive social impact and the techno-centric approaches that some funding agendas promoted. Drawing on our findings, we offer recommendations for funders to scaffold approaches that balance both contextual understanding and technical capacities in future funding call designs. We call for greater engagement between AI4SG funders and the HCI community to support community engagement work in the funding program design process.
Authors:Jonas C. Ditz, Veronika Lazar, Elmar LichtmeÃ, Carola Plesch, Matthias Heck, Kevin Baum, Markus Langer
Abstract:
Human oversight of AI is promoted as a safeguard against risks such as inaccurate outputs, system malfunctions, or violations of fundamental rights, and is mandated in regulation like the European AI Act. Yet debates on human oversight have largely focused on its effectiveness, while overlooking a critical dimension: the security of human oversight. We argue that human oversight creates a new attack surface within the safety, security, and accountability architecture of AI operations. Drawing on cybersecurity perspectives, we analyze attack vectors that threaten the requirements of effective human oversight, thereby undermining the safety of AI operations. Such attacks may target the AI system, its communication with oversight personnel, or the personnel themselves. We then outline hardening strategies to mitigate these risks. Our contributions are: (1) introducing a security perspective on human oversight, and (2) providing an overview of attack vectors and hardening strategies to enable secure human oversight of AI.
Authors:Tim Zindulka, Sven Goller, Daniela Fernandes, Robin Welsch, Daniel Buschek
Abstract:
As large language models (LLMs) become embedded in interactive text generation, disclosure of AI as a source depends on people remembering which ideas or texts came from themselves and which were created with AI. We investigate how accurately people remember the source of content when using AI. In a pre-registered experiment, 184 participants generated and elaborated on ideas both unaided and with an LLM-based chatbot. One week later, they were asked to identify the source (noAI vs withAI) of these ideas and texts. Our findings reveal a significant gap in memory: After AI use, the odds of correct attribution dropped, with the steepest decline in mixed human-AI workflows, where either the idea or elaboration was created with AI. We validated our results using a computational model of source memory. Discussing broader implications, we highlight the importance of considering source confusion in the design and use of interactive text generation technologies.
Authors:Avinash Agarwal, Manisha J. Nene
Abstract:
Purpose: The governance of artificial iintelligence (AI) systems requires a structured approach that connects high-level regulatory principles with practical implementation. Existing frameworks lack clarity on how regulations translate into conformity mechanisms, leading to gaps in compliance and enforcement. This paper addresses this critical gap in AI governance. Methodology/Approach: A five-layer AI governance framework is proposed, spanning from broad regulatory mandates to specific standards, assessment methodologies, and certification processes. By narrowing its scope through progressively focused layers, the framework provides a structured pathway to meet technical, regulatory, and ethical requirements. Its applicability is validated through two case studies on AI fairness and AI incident reporting. Findings: The case studies demonstrate the framework's ability to identify gaps in legal mandates, standardization, and implementation. It adapts to both global and region-specific AI governance needs, mapping regulatory mandates with practical applications to improve compliance and risk management. Practical Implications - By offering a clear and actionable roadmap, this work contributes to global AI governance by equipping policymakers, regulators, and industry stakeholders with a model to enhance compliance and risk management. Social Implications: The framework supports the development of policies that build public trust and promote the ethical use of AI for the benefit of society. Originality/Value: This study proposes a five-layer AI governance framework that bridges high-level regulatory mandates and implementation guidelines. Validated through case studies on AI fairness and incident reporting, it identifies gaps such as missing standardized assessment procedures and reporting mechanisms, providing a structured foundation for targeted governance measures.
Authors:Yuheng Yang, Wenjia Jiang, Yang Wang, Yiwei Wang, Chi Zhang
Abstract:
The rapid progress of large language models (LLMs) has opened new opportunities for education. While learners can interact with academic papers through LLM-powered dialogue, limitations still exist: absence of structured organization and high text reliance can impede systematic understanding and engagement with complex concepts. To address these challenges, we propose Auto-Slides, an LLM-driven system that converts research papers into pedagogically structured, multimodal slides (e.g., diagrams and tables). Drawing on cognitive science, it creates a presentation-oriented narrative and allows iterative refinement via an interactive editor, in order to match learners' knowledge level and goals. Auto-Slides further incorporates verification and knowledge retrieval mechanisms to ensure accuracy and contextual completeness. Through extensive user studies, Auto-Slides enhances learners' comprehension and engagement compared to conventional LLM-based reading. Our contributions lie in designing a multi-agent framework for transforming academic papers into pedagogically optimized slides and introducing interactive customization for personalized learning.
Authors:Farhana Shahid, Stella Zhang, Aditya Vashistha
Abstract:
Large language models (LLMs) are increasingly used to promote prosocial and constructive discourse online. Yet little is known about how they negotiate and shape underlying values when reframing people's arguments on value-laden topics. We conducted experiments with 347 participants from India and the United States, who wrote constructive comments on homophobic and Islamophobic threads, and reviewed human-written and LLM-rewritten versions of these comments. Our analysis shows that LLM systematically diminishes Conservative values while elevating prosocial values such as Benevolence and Universalism. When these comments were read by others, participants opposing same-sex marriage or Islam found human-written comments more aligned with their values, whereas those supportive of these communities found LLM-rewritten versions more aligned with their values. These findings suggest that LLM-driven value homogenization can shape how diverse viewpoints are represented in contentious debates on value-laden topics and may influence the dynamics of online discourse critically.
Authors:Ambre Assor, Jean-Daniel Fekete
Abstract:
We introduce our ongoing work toward an insight-based evaluation methodology aimed at understanding practitioners' mental models when exploring medical data. It is based on ParcoursVis, a Progressive Visual Analytics system designed to visualize event sequences derived from Electronic Health Records at scale (millions of patients, billions of events), developed in collaboration with the Emergency Departments of 16 Parisian hospitals and with the French Social Security. Building on prior usability validation, our current evaluation focuses on the insights generated by expert users and aims to better understand the exploration strategies they employ when engaging with exploration visualization tools. We describe our system and outline our evaluation protocol, analysis strategy, and preliminary findings. Building on this approach and our pilot results, we contribute a design protocol for conducting insight-based studies under real-world constraints, including the availability of health practitioners whom we were fortunate to interview. Our findings highlight a loop, where the use of the system helps refine data variables identification and the system itself. We aim to shed light on generated insights, to highlight the utility of exploratory tools in health data analysis contexts.
Authors:Pranav Khadpe, Kimi Wenzel, George Loewenstein, Geoff Kaufman
Abstract:
When someone sends us a thoughtful message, we naturally form judgments about their character. But what happens when that message carries a label indicating it was written with the help of AI? This paper investigates how the appearance of AI assistance affects our perceptions of message senders. Adding nuance to previous research, through two studies (N=399) featuring vignette scenarios, we find that AI-assistance labels don't necessarily make people view senders negatively. Rather, they dampen the strength of character signals in communication. We show that when someone sends a warmth-signalling message (like thanking or apologizing) without AI help, people more strongly categorize the sender as warm. At the same time, when someone sends a coldness-signalling message (like bragging or blaming) without assistance, people more confidently categorize them as cold. Interestingly, AI labels weaken both these associations: An AI-assisted apology makes the sender appear less warm than if they had written it themselves, and an AI-assisted blame makes the sender appear less cold than if they had composed it independently. This supports our signal diagnosticity explanation: messages labeled as AI-assisted are viewed as less diagnostic than messages which seem unassisted. We discuss how our findings shed light on the causal origins of previously reported observations in AI-Mediated Communication.
Authors:Avinash Agarwal, Manisha J. Nene
Abstract:
The integration of artificial intelligence (AI) into telecommunications infrastructure introduces novel risks, such as algorithmic bias and unpredictable system behavior, that fall outside the scope of traditional cybersecurity and data protection frameworks. This paper introduces a precise definition and a detailed typology of telecommunications AI incidents, establishing them as a distinct category of risk that extends beyond conventional cybersecurity and data protection breaches. It argues for their recognition as a distinct regulatory concern. Using India as a case study for jurisdictions that lack a horizontal AI law, the paper analyzes the country's key digital regulations. The analysis reveals that India's existing legal instruments, including the Telecommunications Act, 2023, the CERT-In Rules, and the Digital Personal Data Protection Act, 2023, focus on cybersecurity and data breaches, creating a significant regulatory gap for AI-specific operational incidents, such as performance degradation and algorithmic bias. The paper also examines structural barriers to disclosure and the limitations of existing AI incident repositories. Based on these findings, the paper proposes targeted policy recommendations centered on integrating AI incident reporting into India's existing telecom governance. Key proposals include mandating reporting for high-risk AI failures, designating an existing government body as a nodal agency to manage incident data, and developing standardized reporting frameworks. These recommendations aim to enhance regulatory clarity and strengthen long-term resilience, offering a pragmatic and replicable blueprint for other nations seeking to govern AI risks within their existing sectoral frameworks.
Authors:Lukas Schach, Christian Rack, Ryan P. McMahan, Marc Erich Latoschik
Abstract:
This paper examines the generalization capacity of two state-of-the-art classification and similarity learning models in reliably identifying users based on their motions in various Extended Reality (XR) applications. We developed a novel dataset containing a wide range of motion data from 49 users in five different XR applications: four XR games with distinct tasks and action patterns, and an additional social XR application with no predefined task sets. The dataset is used to evaluate the performance and, in particular, the generalization capacity of the two models across applications. Our results indicate that while the models can accurately identify individuals within the same application, their ability to identify users across different XR applications remains limited. Overall, our results provide insight into current models generalization capabilities and suitability as biometric methods for user verification and identification. The results also serve as a much-needed risk assessment of hazardous and unwanted user identification in XR and Metaverse applications. Our cross-application XR motion dataset and code are made available to the public to encourage similar research on the generalization of motion-based user identification in typical Metaverse application use cases.
Authors:Jacob Beck, Stephanie Eckman, Christoph Kern, Frauke Kreuter
Abstract:
Human-AI collaboration increasingly drives decision-making across industries, from medical diagnosis to content moderation. While AI systems promise efficiency gains by providing automated suggestions for human review, these workflows can trigger cognitive biases that degrade performance. We know little about the psychological factors that determine when these collaborations succeed or fail. We conducted a randomized experiment with 2,784 participants to examine how task design and individual characteristics shape human responses to AI-generated suggestions. Using a controlled annotation task, we manipulated three factors: AI suggestion quality in the first three instances, task burden through required corrections, and performance-based financial incentives. We collected demographics, attitudes toward AI, and behavioral data to assess four performance metrics: accuracy, correction activity, overcorrection, and undercorrection. Two patterns emerged that challenge conventional assumptions about human-AI collaboration. First, requiring corrections for flagged AI errors reduced engagement and increased the tendency to accept incorrect suggestions, demonstrating how cognitive shortcuts influence collaborative outcomes. Second, individual attitudes toward AI emerged as the strongest predictor of performance, surpassing demographic factors. Participants skeptical of AI detected errors more reliably and achieved higher accuracy, while those favorable toward automation exhibited dangerous overreliance on algorithmic suggestions. The findings reveal that successful human-AI collaboration depends not only on algorithmic performance but also on who reviews AI outputs and how review processes are structured. Effective human-AI collaborations require consideration of human psychology: selecting diverse evaluator samples, measuring attitudes, and designing workflows that counteract cognitive biases.
Authors:Li Ye, Lei Wang, Lihong Cai, Ruiqi Yu, Yong Wang, Yigang Wang, Wei Chen, Zhiguang Zhou
Abstract:
Massive Open Online Courses (MOOCs) have become increasingly popular worldwide. However, learners primarily rely on watching videos, easily losing knowledge context and reducing learning effectiveness. We propose HyperMOOC, a novel approach augmenting MOOC videos with concept-based embedded visualizations to help learners maintain knowledge context. Informed by expert interviews and literature review, HyperMOOC employs multi-glyph designs for different knowledge types and multi-stage interactions for deeper understanding. Using a timeline-based radial visualization, learners can grasp cognitive paths of concepts and navigate courses through hyperlink-based interactions. We evaluated HyperMOOC through a user study with 36 MOOC learners and interviews with two instructors. Results demonstrate that HyperMOOC enhances learners' learning effect and efficiency on MOOCs, with participants showing higher satisfaction and improved course understanding compared to traditional video-based learning approaches.
Authors:Alvaro Becerra, Ruth Cobos, Charles Lang
Abstract:
In modern online learning, understanding and predicting student behavior is crucial for enhancing engagement and optimizing educational outcomes. This systematic review explores the integration of biosensors and Multimodal Learning Analytics (MmLA) to analyze and predict student behavior during computer-based learning sessions. We examine key challenges, including emotion and attention detection, behavioral analysis, experimental design, and demographic considerations in data collection. Our study highlights the growing role of physiological signals, such as heart rate, brain activity, and eye-tracking, combined with traditional interaction data and self-reports to gain deeper insights into cognitive states and engagement levels. We synthesize findings from 54 key studies, analyzing commonly used methodologies such as advanced machine learning algorithms and multimodal data pre-processing techniques. The review identifies current research trends, limitations, and emerging directions in the field, emphasizing the transformative potential of biosensor-driven adaptive learning systems. Our findings suggest that integrating multimodal data can facilitate personalized learning experiences, real-time feedback, and intelligent educational interventions, ultimately advancing toward a more customized and adaptive online learning experience.
Authors:Lazaros Stavrinou, Argyris Constantinides, Marios Belk, Vasos Vassiliou, Fotis Liarokapis, Marios Constantinides
Abstract:
Short-form videos are gaining popularity in education due to their concise and accessible format that enables microlearning. Yet, most of these videos are manually created. Even for those automatically generated using artificial intelligence (AI), it is not well understood whether or how they affect learning outcomes, user experience, and trust. To address this gap, we developed ReelsEd, which is a web-based system that uses large language models (LLMs) to automatically generate structured short-form video (i.e., reels) from lecture long-form videos while preserving instructor-authored material. In a between-subject user study with 62 university students, we evaluated ReelsEd and demonstrated that it outperformed traditional long-form videos in engagement, quiz performance, and task efficiency without increasing cognitive load. Learners expressed high trust in our system and valued its clarity, usefulness, and ease of navigation. Our findings point to new design opportunities for integrating generative AI into educational tools that prioritize usability, learner agency, and pedagogical alignment.
Authors:Evgeny V. Votyakov, Marios Constantinides, Fotis Liarokapis
Abstract:
Amateur runners are increasingly using wearable devices to track their training, and often do so through simple metrics such as heart rate and pace. However, these metrics are typically analyzed in isolation and lack the explainability needed for long-term self-monitoring. In this paper, we first present Fitplotter, which is a client-side web application designed for the visualization and analysis of data associated with fitness and activity tracking devices. Next, we revisited and formalized Heart Rate Efficiency (HRE), defined as the product of pace and heart rate, as a practical and explainable metric to track aerobic fitness in everyday running. Drawing on more than a decade of training data from one athlete, and supplemented by publicly available logs from twelve runners, we showed that HRE provides more stable and meaningful feedback on aerobic development than heart rate or pace alone. We showed that HRE correlates with training volume, reflects seasonal progress, and remains stable during long runs in well-trained individuals. We also discuss how HRE can support everyday training decisions, improve the user experience in fitness tracking, and serve as an explainable metric to proprietary ones of commercial platforms. Our findings have implications for designing user-centered fitness tools that empower amateur athletes to understand and manage their own performance data.
Authors:Chenguang Wang, Xiang Yan, Yilong Dai, Ziyi Wang, Susu Xu
Abstract:
Realistic visual renderings of street-design scenarios are essential for public engagement in active transportation planning. Traditional approaches are labor-intensive, hindering collective deliberation and collaborative decision-making. While AI-assisted generative design shows transformative potential by enabling rapid creation of design scenarios, existing generative approaches typically require large amounts of domain-specific training data and struggle to enable precise spatial variations of design/configuration in complex street-view scenes. We introduce a multi-agent system that edits and redesigns bicycle facilities directly on real-world street-view imagery. The framework integrates lane localization, prompt optimization, design generation, and automated evaluation to synthesize realistic, contextually appropriate designs. Experiments across diverse urban scenarios demonstrate that the system can adapt to varying road geometries and environmental conditions, consistently yielding visually coherent and instruction-compliant results. This work establishes a foundation for applying multi-agent pipelines to transportation infrastructure planning and facility design.
Authors:Atikkhan Faridkhan Nilgar, Manuel Dietrich, Kristof Van Laerhoven
Abstract:
Social robots are increasingly recognized as valuable supporters in the field of well-being coaching. They can function as independent coaches or provide support alongside human coaches, and healthcare professionals. In coaching interactions, these robots often handle sensitive information shared by users, making privacy a relevant issue. Despite this, little is known about the factors that shape users' privacy perceptions. This research aims to examine three key factors systematically: (1) the transparency about information usage, (2) the level of specific user control over how the robot uses their information, and (3) the robot's behavioral approach - whether it acts proactively or only responds on demand. Our results from an online study (N = 200) show that even when users grant the robot general access to personal data, they additionally expect the ability to explicitly control how that information is interpreted and shared during sessions. Experimental conditions that provided such control received significantly higher ratings for perceived privacy appropriateness and trust. Compared to user control, the effects of transparency and proactivity on privacy appropriateness perception were low, and we found no significant impact. The results suggest that merely informing users or proactive sharing is insufficient without accompanying user control. These insights underscore the need for further research on mechanisms that allow users to manage robots' information processing and sharing, especially when social robots take on more proactive roles alongside humans.
Authors:Atikkhan Faridkhan Nilgar, Kristof Van Laerhoven, Ayub Kinoti
Abstract:
We present SRWToolkit, an open-source Wizard of Oz toolkit designed to facilitate the rapid prototyping of social robotic avatars powered by local large language models (LLMs). Our web-based toolkit enables multimodal interaction through text input, button-activated speech, and wake-word command. The toolkit offers real-time configuration of avatar appearance, behavior, language, and voice via an intuitive control panel. In contrast to prior works that rely on cloud-based LLM services, SRWToolkit emphasizes modularity and ensures on-device functionality through local LLM inference. In our small-scale user study ($n=11$), participants created and interacted with diverse robotic roles (hospital receptionist, mathematics teacher, and driving assistant), which demonstrated positive outcomes in the toolkit's usability, trust, and user experience. The toolkit enables rapid and efficient development of robot characters customized to researchers' needs, supporting scalable research in human-robot interaction.
Authors:Sangbong Yoo, Seongbum Seo, Chanyoung Yoon, Hyelim Lee, Jeong-Nam Kim, Chansoo Kim, Yun Jang, Takanori Fujiwara
Abstract:
Analysis of public opinions collected from digital media helps organizations maintain positive relationships with the public. Such public relations (PR) analysis often involves assessing opinions, for example, measuring how strongly people trust an organization. Pre-trained Large Language Models (LLMs) hold great promise for supporting Organization-Public Relationship Assessment (OPRA) because they can map unstructured public text to OPRA dimensions and articulate rationales through prompting. However, adapting LLMs for PR analysis typically requires fine-tuning on large labeled datasets, which is both labor-intensive and knowledge-intensive, making it difficult for PR researchers to apply these models. In this paper, we present OPRA-Vis, a visual analytics system that leverages LLMs for OPRA without requiring extensive labeled data. Our framework employs Chain-of-Thought prompting to guide LLMs in analyzing public opinion data by incorporating PR expertise directly into the reasoning process. Furthermore, OPRA-Vis provides visualizations that reveal the clues and reasoning paths used by LLMs, enabling users to explore, critique, and refine model decisions. We demonstrate the effectiveness of OPRA-Vis through two real-world use cases and evaluate it quantitatively, through comparisons with alternative LLMs and prompting strategies, and qualitatively, through assessments of usability, effectiveness, and expert feedback.
Authors:Jinjoo Shim, Jacob Hunecke, Elgar Fleisch, Filipe Barata
Abstract:
Every day, millions of people worldwide track their steps, sleep, and activity rhythms with smartwatches and fitness trackers. These continuously collected data streams present a remarkable opportunity to transform routine self-tracking into meaningful health insights that enable individuals to understand -- and potentially influence -- their biological aging. Yet most tools for analyzing wearable data remain fragmented, proprietary, and inaccessible, creating a major barrier between this vast reservoir of personal health information and its translation into actionable insights on aging. CosinorAge is an open-source framework that estimates biological age from wearable-derived circadian, physical activity, and sleep metrics. It addresses the lack of unified, reproducible pipelines for jointly analyzing rest-activity rhythmicity, physical activity, and sleep, and linking them to health outcomes. The Python package provides an end-to-end workflow from raw data ingestion and preprocessing to feature computation and biological age estimation, supporting multiple input sources across wearables and smartwatch. It also makes available trained model parameters (open weights) derived from large-scale population datasets such as UK Biobank, enabling reproducibility, transparency, and generalizability across studies. Its companion web-based CosinorAge Calculator enables non-technical users to access identical analytical capabilities through an intuitive interface. By combining transparent, reproducible analysis with broad accessibility, CosinorAge advances scalable, personalized health monitoring and bridges digital health technologies with biological aging research.
Authors:Matthew Varona, Karen Bonilla, Maryam Hedayati, Alark Joshi, Lane Harrison, Matthew Kay, Carolina Nobre
Abstract:
Research in visualization literacy explores the skills required to engage with visualizations. This state-of-the-art report surveys the current literature in visualization literacy to provide a comprehensive overview of the field. We propose a taxonomy of visualization literacy that organizes the field into competency themes and research categories. To address ambiguity surrounding the term ``visualization literacy'', we provide a framework for operationalizing visualization literacy based on application contexts (including domain, scenario, and audience) and relevant competencies, which are categorized under consumption, construction, critique, and connection. Research contributions are organized into five categories: ontology, assessment, mechanisms, populiteracy, and intervention. For each category, we identify key trends, discuss which competencies are addressed, highlight open challenges, and examine how advancements within these areas inform and reinforce each other, driving progress in the field.
Authors:Akriti Verma, Shama Islam, Valeh Moghaddam, Adnan Anwar
Abstract:
The pervasiveness of online toxicity, including hate speech and trolling, disrupts digital interactions and online well-being. Previous research has mainly focused on post-hoc moderation, overlooking the real-time emotional dynamics of online conversations and the impact of users' emotions on others. This paper presents a graph-based framework to identify the need for emotion regulation within online conversations. This framework promotes self-reflection to manage emotional responses and encourage responsible behaviour in real time. Additionally, a comment queuing mechanism is proposed to address intentional trolls who exploit emotions to inflame conversations. This mechanism introduces a delay in publishing comments, giving users time to self-regulate before further engaging in the conversation and helping maintain emotional balance. Analysis of social media data from Twitter and Reddit demonstrates that the graph-based framework reduced toxicity by 12%, while the comment queuing mechanism decreased the spread of anger by 15%, with only 4% of comments being temporarily held on average. These findings indicate that combining real-time emotion regulation with delayed moderation can significantly improve well-being in online environments.
Authors:Bin Han, Deuksin Kwon, Spencer Lin, Kaleen Shrestha, Jonathan Gratch
Abstract:
This study proposes a framework that employs personality prompting with Large Language Models to generate verbal and nonverbal behaviors for virtual agents based on personality traits. Focusing on extraversion, we evaluated the system in two scenarios: negotiation and ice breaking, using both introverted and extroverted agents. In Experiment 1, we conducted agent to agent simulations and performed linguistic analysis and personality classification to assess whether the LLM generated language reflected the intended traits and whether the corresponding nonverbal behaviors varied by personality. In Experiment 2, we carried out a user study to evaluate whether these personality aligned behaviors were consistent with their intended traits and perceptible to human observers. Our results show that LLMs can generate verbal and nonverbal behaviors that align with personality traits, and that users are able to recognize these traits through the agents' behaviors. This work underscores the potential of LLMs in shaping personality aligned virtual agents.
Authors:Yutong Zhang, Taeuk Kang, Sydney Yeh, Anavi Baddepudi, Lindsay Popowski, Tiziano Piccardi, Michael S. Bernstein
Abstract:
Positive social interactions can occur in groups of many shapes and sizes, spanning from small and private to large and open. However, social media tends to binarize our experiences into either isolated small groups or into large public squares. In this paper, we introduce Burst, a social media design that allows users to share and curate content between many spaces of varied size and composition. Users initially post content to small trusted groups, who can then burst that content, routing it to the groups that would be the best audience. We instantiate this approach into a mobile phone application, and demonstrate through a ten-day field study (N=36) that Burst enabled a participatory curation culture. With this work, we aim to articulate potential new design directions for social media sharing.
Authors:Amrit Poudel, Maria Milkowski, Tim Weninger
Abstract:
Search engines play a central role in how people gather information, but subtle cues like headline framing may influence not only what users believe but also how they search. While framing effects on judgment are well documented, their impact on subsequent search behavior is less understood. We conducted a controlled experiment where participants issued queries and selected from headlines filtered by specific linguistic frames. Headline framing significantly shaped follow-up queries: conflict and strategy frames disrupted alignment with prior selections, while episodic frames led to more concrete queries than thematic ones. We also observed modest short-term frame persistence that declined over time. These results suggest that even brief exposure to framing can meaningfully alter the direction of users information-seeking behavior.
Authors:Md Tariquzzaman, Md Farhan Ishmam, Saiyma Sittul Muna, Md Kamrul Hasan, Hasan Mahmud
Abstract:
Sign Language (SL) enables two-way communication for the deaf and hard-of-hearing community, yet many sign languages remain under-resourced in the AI space. Sign Language Instruction Generation (SLIG) produces step-by-step textual instructions that enable non-SL users to imitate and learn SL gestures, promoting two-way interaction. We introduce BdSLIG, the first Bengali SLIG dataset, used to evaluate Vision Language Models (VLMs) (i) on under-resourced SLIG tasks, and (ii) on long-tail visual concepts, as Bengali SL is unlikely to appear in the VLM pre-training data. To enhance zero-shot performance, we introduce Sign Parameter-Infused (SPI) prompting, which integrates standard SL parameters, like hand shape, motion, and orientation, directly into the textual prompts. Subsuming standard sign parameters into the prompt makes the instructions more structured and reproducible than free-form natural text from vanilla prompting. We envision that our work would promote inclusivity and advancement in SL learning systems for the under-resourced communities.
Authors:Jon E. Froehlich, Jared Hwang, Zeyu Wang, John S. O'Meara, Xia Su, William Huang, Yang Zhang, Alex Fiannaca, Philip Nelson, Shaun Kane
Abstract:
Interactive digital maps have revolutionized how people travel and learn about the world; however, they rely on pre-existing structured data in GIS databases (e.g., road networks, POI indices), limiting their ability to address geo-visual questions related to what the world looks like. We introduce our vision for Geo-Visual Agents--multimodal AI agents capable of understanding and responding to nuanced visual-spatial inquiries about the world by analyzing large-scale repositories of geospatial images, including streetscapes (e.g., Google Street View), place-based photos (e.g., TripAdvisor, Yelp), and aerial imagery (e.g., satellite photos) combined with traditional GIS data sources. We define our vision, describe sensing and interaction approaches, provide three exemplars, and enumerate key challenges and opportunities for future work.
Authors:Dooyoung Kim, Woontack Woo
Abstract:
We propose the Spatio-Temporal Mixed and Augmented Reality Experience Description (MAR-ED), a novel framework to standardize the representation of past events for interactive and adaptive playback in a user's present physical space. While current spatial media technologies have primarily focused on capturing or replaying content as static assets, often disconnected from the viewer's environment or offering limited interactivity, the means to describe an experience's underlying semantic and interactive structure remains underexplored. We propose a descriptive framework called MAR-ED based on three core primitives: 1) Event Primitives for semantic scene graph representation, 2) Keyframe Primitives for efficient and meaningful data access, and 3) Playback Primitives for user-driven adaptive interactive playback of recorded MAR experience. The proposed flowchart of the three-stage process of the proposed MAR-ED framework transforms a recorded experience into a unique adaptive MAR experience during playback, where its spatio-temporal structure dynamically conforms to a new environment and its narrative can be altered by live user input. Drawing on this framework, personal digital memories and recorded events can evolve beyond passive 2D/3D videos into immersive, spatially-integrated group experiences, opening new paradigms for training, cultural heritage, and interactive storytelling without requiring complex, per-user adaptive rendering.
Authors:Yuge Zhang, Nan Chen, Jiahang Xu, Yuqing Yang
Abstract:
Large Language Models (LLMs) require sophisticated prompting, yet current practices face challenges in structure, data integration, format sensitivity, and tooling. Existing methods lack comprehensive solutions for organizing complex prompts involving diverse data types (documents, tables, images) or managing presentation variations systematically. To address these gaps, we introduce POML (Prompt Orchestration Markup Language). POML employs component-based markup for logical structure (roles, tasks, examples), specialized tags for seamless data integration, and a CSS-like styling system to decouple content from presentation, reducing formatting sensitivity. It includes templating for dynamic prompts and a comprehensive developer toolkit (IDE support, SDKs) to improve version control and collaboration. We validate POML through two case studies demonstrating its impact on complex application integration (PomLink) and accuracy performance (TableQA), as well as a user study assessing its effectiveness in real-world development scenarios.
Authors:Maciej Skorski, Alina Landowska
Abstract:
How do Large Language Models understand moral dimensions compared to humans? This first large-scale Bayesian evaluation of market-leading language models provides the answer. In contrast to prior work using deterministic ground truth (majority or inclusion rules), we model annotator disagreements to capture both aleatoric uncertainty (inherent human disagreement) and epistemic uncertainty (model domain sensitivity). We evaluated the best language models (Claude Sonnet 4, DeepSeek-V3, Llama 4 Maverick) across 250K+ annotations from nearly 700 annotators in 100K+ texts spanning social networks, news and forums. Our GPU-optimized Bayesian framework processed 1M+ model queries, revealing that AI models typically rank among the top 25\% of human annotators, performing much better than average balanced accuracy. Importantly, we find that AI produces far fewer false negatives than humans, highlighting their more sensitive moral detection capabilities.
Authors:Minju Baeck, Yoonseok Shin, Dooyoung Kim, Hyunjin Lee, Sang Ho Yoon, Woontack Woo
Abstract:
We propose a visuo-tactile feedback method that combines virtual hand visualization and fingertip vibrations to modulate affective roughness perception in VR. While prior work has focused on object-based textures and vibrotactile feedback, the role of visual feedback on virtual hands remains underexplored. Our approach introduces affective visual cues including line shape, motion, and color applied to hand outlines, and examines their influence on both affective responses (arousal, valence) and perceived roughness. Results show that sharp contours enhanced perceived roughness, increased arousal, and reduced valence, intensifying the emotional impact of haptic feedback. In contrast, color affected valence only, with red consistently lowering emotional positivity. These effects were especially noticeable at lower haptic intensities, where visual cues extended affective modulation into mid-level perceptual ranges. Overall, the findings highlight how integrating expressive visual cues with tactile feedback can enrich affective rendering and offer flexible emotional tuning in immersive VR interactions.
Authors:Di Hu, Yawen Guo, Ha Na Cho, Emilie Chow, Dana B. Mukamel, Dara Sorkin, Andrew Reikes, Danielle Perret, Deepti Pandita, Kai Zheng
Abstract:
The increasing burden of responding to large volumes of patient messages has become a key factor contributing to physician burnout. Generative AI (GenAI) shows great promise to alleviate this burden by automatically drafting patient message replies. The ethical implications of this use have however not been fully explored. To address this knowledge gap, we conducted a semi-structured interview study with 21 physicians who participated in a GenAI pilot program. We found that notable ethical considerations expressed by the physician participants included human oversight as ethical safeguard, transparency and patient consent of AI use, patient misunderstanding of AI's role, and patient privacy and data security as prerequisites. Additionally, our findings suggest that the physicians believe the ethical responsibility of using GenAI in this context primarily lies with users, not with the technology. These findings may provide useful insights into guiding the future implementation of GenAI in clinical practice.
Authors:Jana Gonnermann-Müller, Jennifer Haase, Konstantin Fackeldey, Sebastian Pokutta
Abstract:
The increasing heterogeneity of student populations poses significant challenges for teachers, particularly in mathematics education, where cognitive, motivational, and emotional differences strongly influence learning outcomes. While AI-driven personalization tools have emerged, most remain performance-focused, offering limited support for teachers and neglecting broader pedagogical needs. This paper presents the FACET framework, a teacher-facing, large language model (LLM)-based multi-agent system designed to generate individualized classroom materials that integrate both cognitive and motivational dimensions of learner profiles. The framework comprises three specialized agents: (1) learner agents that simulate diverse profiles incorporating topic proficiency and intrinsic motivation, (2) a teacher agent that adapts instructional content according to didactical principles, and (3) an evaluator agent that provides automated quality assurance. We tested the system using authentic grade 8 mathematics curriculum content and evaluated its feasibility through a) automated agent-based assessment of output quality and b) exploratory feedback from K-12 in-service teachers. Results from ten internal evaluations highlighted high stability and alignment between generated materials and learner profiles, and teacher feedback particularly highlighted structure and suitability of tasks. The findings demonstrate the potential of multi-agent LLM architectures to provide scalable, context-aware personalization in heterogeneous classroom settings, and outline directions for extending the framework to richer learner profiles and real-world classroom trials.
Authors:Rosiana Natalie, Wenqian Xu, Ruei-Che Chang, Rada Mihalcea, Anhong Guo
Abstract:
Advances in vision language models (VLMs) have enabled the simulation of general human behavior through their reasoning and problem solving capabilities. However, prior research has not investigated such simulation capabilities in the accessibility domain. In this paper, we evaluate the extent to which VLMs can simulate the vision perception of low vision individuals when interpreting images. We first compile a benchmark dataset through a survey study with 40 low vision participants, collecting their brief and detailed vision information and both open-ended and multiple-choice image perception and recognition responses to up to 25 images. Using these responses, we construct prompts for VLMs (GPT-4o) to create simulated agents of each participant, varying the included information on vision information and example image responses. We evaluate the agreement between VLM-generated responses and participants' original answers. Our results indicate that VLMs tend to infer beyond the specified vision ability when given minimal prompts, resulting in low agreement (0.59). The agreement between the agent' and participants' responses remains low when only either the vision information (0.59) or example image responses (0.59) are provided, whereas a combination of both significantly increase the agreement (0.70, p < 0.0001). Notably, a single example combining both open-ended and multiple-choice responses, offers significant performance improvements over either alone (p < 0.0001), while additional examples provided minimal benefits (p > 0.05).
Authors:Ambre Assor, Mickael Sereno, Jean-Daniel Fekete
Abstract:
We present ParcoursVis, an open-source Progressive Visual Analytics tool designed to explore aggregated electronic health record sequences of patients at scale. Existing tools are limited to about 20k patients that they can process fast enough to remain interactive, under human latency limits. They need to process the whole dataset before showing the visualization, taking a time proportional to the data size. Yet, managing large datasets allows for discovering rare medical conditions and unexpected patient pathways, contributing to improving treatments. To overcome this limitation, ParcoursVis relies on a progressive aggregation algorithm that quickly computes an approximate initial result, visualized as an Icicle tree, and improves it iteratively, until the whole computation is done. With its architecture, ParcoursVis remains interactive while visualizing the sequences of millions of patients -- three orders of magnitude more than similar tools. We describe our PVA architecture, which achieves scalability with fast convergence and visual stability.
Authors:Sanchita S. Kamath, Aziz N. Zeidieh, JooYoung Seo
Abstract:
Blind and low-vision (BLV) users remain largely excluded from three-dimensional (3D) surface and point data visualizations due to the reliance on visual interaction. Existing approaches inadequately support non-visual access, especially in browser-based environments. This study introduces DIXTRAL, a hosted web-native system, co-designed with BLV researchers to address these gaps through multimodal interaction. Conducted with two blind and one sighted researcher, this study took place over sustained design sessions. Data were gathered through iterative testing of the prototype, collecting feedback on spatial navigation, sonification, and usability. Co-design observations demonstrate that synchronized auditory, visual, and textual feedback, combined with keyboard and gamepad navigation, enhances both structure discovery and orientation. DIXTRAL aims to improve access to 3D continuous scalar fields for BLV users and inform best practices for creating inclusive 3D visualizations.
Authors:Leonardo Ferreira, Gustavo Moreira, Fabio Miranda
Abstract:
Designing and building visual analytics (VA) systems is a complex, iterative process that requires the seamless integration of data processing, analytics capabilities, and visualization techniques. While prior research has extensively examined the social and collaborative aspects of VA system authoring, the practical challenges of developing these systems remain underexplored. As a result, despite the growing number of VA systems, there are only a few structured knowledge bases to guide their design and development. To tackle this gap, we propose VA-Blueprint, a methodology and knowledge base that systematically reviews and categorizes the fundamental building blocks of urban VA systems, a domain particularly rich and representative due to its intricate data and unique problem sets. Applying this methodology to an initial set of 20 systems, we identify and organize their core components into a multi-level structure, forming an initial knowledge base with a structured blueprint for VA system development. To scale this effort, we leverage a large language model to automate the extraction of these components for other 81 papers (completing a corpus of 101 papers), assessing its effectiveness in scaling knowledge base construction. We evaluate our method through interviews with experts and a quantitative analysis of annotation metrics. Our contributions provide a deeper understanding of VA systems' composition and establish a practical foundation to support more structured, reproducible, and efficient system development. VA-Blueprint is available at https://urbantk.org/va-blueprint.
Authors:Gustavo Moreira, Leonardo Ferreira, Carolina Veiga, Maryam Hosseini, Fabio Miranda
Abstract:
With the growing availability of urban data and the increasing complexity of societal challenges, visual analytics has become essential for deriving insights into pressing real-world problems. However, analyzing such data is inherently complex and iterative, requiring expertise across multiple domains. The need to manage diverse datasets, distill intricate workflows, and integrate various analytical methods presents a high barrier to entry, especially for researchers and urban experts who lack proficiency in data management, machine learning, and visualization. Advancements in large language models offer a promising solution to lower the barriers to the construction of analytics systems by enabling users to specify intent rather than define precise computational operations. However, this shift from explicit operations to intent-based interaction introduces challenges in ensuring alignment throughout the design and development process. Without proper mechanisms, gaps can emerge between user intent, system behavior, and analytical outcomes. To address these challenges, we propose Urbanite, a framework for human-AI collaboration in urban visual analytics. Urbanite leverages a dataflow-based model that allows users to specify intent at multiple scopes, enabling interactive alignment across the specification, process, and evaluation stages of urban analytics. Based on findings from a survey to uncover challenges, Urbanite incorporates features to facilitate explainability, multi-resolution definition of tasks across dataflows, nodes, and parameters, while supporting the provenance of interactions. We demonstrate Urbanite's effectiveness through usage scenarios created in collaboration with urban experts. Urbanite is available at https://urbantk.org/urbanite.
Authors:Hyo Jin Do, Werner Geyer
Abstract:
Large language models are known to produce outputs that are plausible but factually incorrect. To prevent people from making erroneous decisions by blindly trusting AI, researchers have explored various ways of communicating factuality estimates in AI-generated outputs to end-users. However, little is known about whether revealing content estimated to be factually incorrect influences users' trust when compared to hiding it altogether. We tested four different ways of disclosing an AI-generated output with factuality assessments: transparent (highlights less factual content), attention (highlights factual content), opaque (removes less factual content), ambiguity (makes less factual content vague), and compared them with a baseline response without factuality information. We conducted a human subjects research (N = 148) using the strategies in question-answering scenarios. We found that the opaque and ambiguity strategies led to higher trust while maintaining perceived answer quality, compared to the other strategies. We discuss the efficacy of hiding presumably less factual content to build end-user trust.
Authors:Dooyoung Kim, Jinseok Hong, Heejeong Ko, Woontack Woo
Abstract:
We proposed viewpoint-tolerant shared depth perception without individual tracking by leveraging human cognitive compensation in universally 3D rendered images on a wall-sized display. While traditional 3D perception-enabled display systems have primarily focused on single-user scenarios-adapting rendering based on head and eye tracking the use of wall-sized displays to extend spatial experiences and support perceptually coherent multi-user interactions remains underexplored. We investigated the effects of virtual depths (dv) and absolute viewing distance (da) on human cognitive compensation factors (perceived distance difference, viewing angle threshold, and perceived presence) to construct the wall display-based eXtended Reality (XR) space. Results show that participants experienced a compelling depth perception even from off-center angles of 23 to 37 degrees, and largely increasing virtual depth worsens depth perception and presence factors, highlighting the importance of balancing extended depth of virtual space and viewing distance from the wall-sized display. Drawing on these findings, wall-sized displays in venues such as museums, galleries, and classrooms can evolve beyond 2D information sharing to offer immersive, spatially extended group experiences without individualized tracking or wearables.
Authors:Hyo Jin Do, Rachel Ostrand, Werner Geyer, Keerthiram Murugesan, Dennis Wei, Justin Weisz
Abstract:
Large language models (LLMs) are susceptible to generating inaccurate or false information, often referred to as "hallucinations" or "confabulations." While several technical advancements have been made to detect hallucinated content by assessing the factuality of the model's responses, there is still limited research on how to effectively communicate this information to users. To address this gap, we conducted two scenario-based experiments with a total of 208 participants to systematically compare the effects of various design strategies for communicating factuality scores by assessing participants' ratings of trust, ease in validating response accuracy, and preference. Our findings reveal that participants preferred and trusted a design in which all phrases within a response were color-coded based on factuality scores. Participants also found it easier to validate accuracy of the response in this style compared to a baseline with no style applied. Our study offers practical design guidelines for LLM application developers and designers, aimed at calibrating user trust, aligning with user preferences, and enhancing users' ability to scrutinize LLM outputs.
Authors:Xueyuan Xu, Tianze Yu, Wenjia Dong, Fulin Wei, Li Zhuo
Abstract:
Recently, multi-modal physiological signals based emotion recognition has garnered increasing attention in the field of brain-computer interfaces. Nevertheness, the associated multi-modal physiological features are often high-dimensional and inevitably include irrelevant, redundant, and noisy representation, which can easily lead to overfitting, poor performance, and high computational complexity in emotion classifiers. Feature selection has been widely applied to address these challenges. However, previous studies generally assumed that multi-modal physiological data are complete, whereas in reality, the data are often incomplete due to the openness of the acquisition and operational environment. For example, a part of samples are available in several modalities but not in others. To address this issue, we propose a novel method for incomplete multi-modal physiological signal feature selection called adaptive shared latent structure learning (ASLSL). Based on the property that similar features share similar emotional labels, ASLSL employs adaptive shared latent structure learning to explore a common latent space shared for incomplete multi-modal physiological signals and multi-dimensional emotional labels, thereby mitigating the impact of missing information and mining consensus information. Two most popular multi-modal physiological emotion datasets (DEAP and DREAMER) with multi-dimensional emotional labels were utilized to compare the performance between compare ASLSL and seventeen feature selection methods. Comprehensive experimental results on these datasets demonstrate the effectiveness of ASLSL.
Authors:Xueyuan Xu, Wenjia Dong, Fulin Wei, Li Zhuo
Abstract:
The affective brain-computer interface is a crucial technology for affective interaction and emotional intelligence, emerging as a significant area of research in the human-computer interaction. Compared to single-type features, multi-type EEG features provide a multi-level representation for analyzing multi-dimensional emotions. However, the high dimensionality of multi-type EEG features, combined with the relatively small number of high-quality EEG samples, poses challenges such as classifier overfitting and suboptimal real-time performance in multi-dimensional emotion recognition. Moreover, practical applications of affective brain-computer interface frequently encounters partial absence of multi-dimensional emotional labels due to the open nature of the acquisition environment, and ambiguity and variability in individual emotion perception. To address these challenges, this study proposes a novel EEG feature selection method for missing multi-dimensional emotion recognition. The method leverages adaptive orthogonal non-negative matrix factorization to reconstruct the multi-dimensional emotional label space through second-order and higher-order correlations, which could reduce the negative impact of missing values and outliers on label reconstruction. Simultaneously, it employs least squares regression with graph-based manifold learning regularization and global feature redundancy minimization regularization to enable EEG feature subset selection despite missing information, ultimately achieving robust EEG-based multi-dimensional emotion recognition. Simulation experiments on three widely used multi-dimensional emotional datasets, DREAMER, DEAP and HDED, reveal that the proposed method outperforms thirteen advanced feature selection methods in terms of robustness for EEG emotional feature selection.
Authors:Albert Yu, Chengshu Li, Luca Macesanu, Arnav Balaji, Ruchira Ray, Raymond Mooney, Roberto MartÃn-MartÃn
Abstract:
Effective robotic systems for long-horizon human-robot collaboration must adapt to a wide range of human partners, whose physical behavior, willingness to assist, and understanding of the robot's capabilities may change over time. This demands a tightly coupled communication loop that grants both agents the flexibility to propose, accept, or decline requests as they coordinate toward completing the task effectively. We apply a Mixed-Initiative dialog paradigm to Collaborative human-roBot teaming and propose MICoBot, a system that handles the common scenario where both agents, using natural language, take initiative in formulating, accepting, or rejecting proposals on who can best complete different steps of a task. To handle diverse, task-directed dialog, and find successful collaborative strategies that minimize human effort, MICoBot makes decisions at three levels: (1) a meta-planner considers human dialog to formulate and code a high-level collaboration strategy, (2) a planner optimally allocates the remaining steps to either agent based on the robot's capabilities (measured by a simulation-pretrained affordance model) and the human's estimated availability to help, and (3) an action executor decides the low-level actions to perform or words to say to the human. Our extensive evaluations in simulation and real-world -- on a physical robot with 18 unique human participants over 27 hours -- demonstrate the ability of our method to effectively collaborate with diverse human users, yielding significantly improved task success and user experience than a pure LLM baseline and other agent allocation models. See additional videos and materials at https://robin-lab.cs.utexas.edu/MicoBot/.
Authors:Katsiaryna Dzialets, Aleksandra Makeeva, Ilya Vlasov, Anna Potriasaeva, Aleksei Rostovskii, Yaroslav Golubev, Anastasiia Birillo
Abstract:
Computer science education is a dynamic field with many aspects that influence the learner's path. While these aspects are usually studied in depth separately, it is also important to carry out broader large-scale studies that touch on many topics, because they allow us to put different results into each other's perspective. Past large-scale surveys have provided valuable insights, however, the emergence of new trends (e.g., AI), new learning formats (e.g., in-IDE learning), and the increasing learner diversity highlight the need for an updated comprehensive study. To address this, we conducted a survey with 18,032 learners from 173 countries, ensuring diverse representation and exploring a wide range of topics - formal education, learning formats, AI usage, challenges, motivation, and more. This paper introduces the results of this survey as an open dataset, describes our methodology and the survey questions, and highlights, as a motivating example, three possible research directions within this data: challenges in learning, emerging formats, and insights into the in-IDE format. The dataset aims to support further research and foster advancements in computer education.
Authors:Wenjia Dong, Xueyuan Xu, Tianze Yu, Junming Zhang, Li Zhuo
Abstract:
Electroencephalogram (EEG)-based emotion recognition holds significant value in affective computing and brain-computer interfaces. However, in practical applications, EEG recordings are susceptible to the effects of various physiological artifacts. Current approaches typically treat denoising and emotion recognition as independent tasks using cascaded architectures, which not only leads to error accumulation, but also fails to exploit potential synergies between these tasks. Moreover, conventional EEG-based emotion recognition models often rely on the idealized assumption of "perfectly denoised data", lacking a systematic design for noise robustness. To address these challenges, a novel framework that deeply couples denoising and emotion recognition tasks is proposed for end-to-end noise-robust emotion recognition, termed as Feedback-Driven Collaborative Network for Denoising-Classification Nexus (FDC-Net). Our primary innovation lies in establishing a dynamic collaborative mechanism between artifact removal and emotion recognition through: (1) bidirectional gradient propagation with joint optimization strategies; (2) a gated attention mechanism integrated with frequency-adaptive Transformer using learnable band-position encoding. Two most popular EEG-based emotion datasets (DEAP and DREAMER) with multi-dimensional emotional labels were employed to compare the artifact removal and emotion recognition performance between FDC-Net and nine state-of-the-art methods. In terms of the denoising task, FDC-Net obtains a maximum correlation coefficient (CC) value of 96.30% on DEAP and a maximum CC value of 90.31% on DREAMER. In terms of the emotion recognition task under physiological artifact interference, FDC-Net achieves emotion recognition accuracies of 82.3+7.1% on DEAP and 88.1+0.8% on DREAMER.
Authors:Tianze Yu, Junming Zhang, Wenjia Dong, Xueyuan Xu, Li Zhuo
Abstract:
EEG based multi-dimension emotion recognition has attracted substantial research interest in human computer interfaces. However, the high dimensionality of EEG features, coupled with limited sample sizes, frequently leads to classifier overfitting and high computational complexity. Feature selection constitutes a critical strategy for mitigating these challenges. Most existing EEG feature selection methods assume complete multi-dimensional emotion labels. In practice, open acquisition environment, and the inherent subjectivity of emotion perception often result in incomplete label data, which can compromise model generalization. Additionally, existing feature selection methods for handling incomplete multi-dimensional labels primarily focus on correlations among various dimensions during label recovery, neglecting the correlation between samples in the label space and their interaction with various dimensions. To address these issues, we propose a novel incomplete multi-dimensional feature selection algorithm for EEG-based emotion recognition. The proposed method integrates an adaptive dual self-expression learning (ADSEL) with least squares regression. ADSEL establishes a bidirectional pathway between sample-level and dimension-level self-expression learning processes within the label space. It could facilitate the cross-sharing of learned information between these processes, enabling the simultaneous exploitation of effective information across both samples and dimensions for label reconstruction. Consequently, ADSEL could enhances label recovery accuracy and effectively identifies the optimal EEG feature subset for multi-dimensional emotion recognition.
Authors:Xueyuan Xu, Wenjia Dong, Fulin Wei, Li Zhuo
Abstract:
Due to the intracranial volume conduction effects, high-dimensional multi-channel electroencephalography (EEG) features often contain substantial redundant and irrelevant information. This issue not only hinders the extraction of discriminative emotional representations but also compromises the real-time performance. Feature selection has been established as an effective approach to address the challenges while enhancing the transparency and interpretability of emotion recognition models. However, existing EEG feature selection research overlooks the influence of latent EEG feature structures on emotional label correlations and assumes uniform importance across various channels, directly limiting the precise construction of EEG feature selection models for multi-dimensional affective computing. To address these limitations, a novel channel-wise EEG feature selection (CWEFS) method is proposed for multi-dimensional emotion recognition. Specifically, inspired by brain volume conduction effects, CWEFS integrates EEG emotional feature selection into a shared latent structure model designed to construct a consensus latent space across diverse EEG channels. To preserve the local geometric structure, this consensus space is further integrated with the latent semantic analysis of multi-dimensional emotional labels. Additionally, CWEFS incorporates adaptive channel-weight learning to automatically determine the significance of different EEG channels in the emotional feature selection task. The effectiveness of CWEFS was validated using three popular EEG datasets with multi-dimensional emotional labels. Comprehensive experimental results, compared against nineteen feature selection methods, demonstrate that the EEG feature subsets chosen by CWEFS achieve optimal emotion recognition performance across six evaluation metrics.
Authors:Mohammed Almutairi, Charles Chiang, Haoze Guo, Matthew Belcher, Nandini Banerjee, Maria Milkowski, Svitlana Volkova, Daniel Nguyen, Tim Weninger, Michael Yankoski, Trenton W. Ford, Diego Gomez-Zara
Abstract:
Simulating how team members collaborate within complex environments using Agentic AI is a promising approach to explore hypotheses grounded in social science theories and study team behaviors. We introduce VirtLab, a user-friendly, customizable, multi-agent, and scalable team simulation system that enables testing teams with LLM-based agents in spatial and temporal settings. This system addresses the current frameworks' design and technical limitations that do not consider flexible simulation scenarios and spatial settings. VirtLab contains a simulation engine and a web interface that enables both technical and non-technical users to formulate, run, and analyze team simulations without programming. We demonstrate the system's utility by comparing ground truth data with simulated scenarios.
Authors:Gloria Fernández-Nieto, Lele Sha, Yuheng Li, Yi-Shan Tsai, Guanliang Chen, Yinwei Wei, Weiqing Wang, Jinchun Wen, Shaveen Singh, Ivan Silva, Yuanfang Li, Dragan GasÄviÄ, Zachari Swiecki
Abstract:
Designing Knowledge Management Systems (KMSs) for higher education requires addressing complex human-technology interactions, especially where staff turnover and changing roles create ongoing challenges for reusing knowledge. While advances in process mining and Generative AI enable new ways of designing features to support knowledge management, existing KMSs often overlook the realities of educators' workflows, leading to low adoption and limited impact. This paper presents findings from a two-year human-centred design study with 108 higher education teachers, focused on the iterative co-design and evaluation of GoldMind, a KMS supporting in-the-flow knowledge management during digital teaching tasks. Through three design-evaluation cycles, we examined how teachers interacted with the system and how their feedback informed successive refinements. Insights are synthesised across three themes: (1) Technology Lessons from user interaction data, (2) Design Considerations shaped by co-design and usability testing, and (3) Human Factors, including cognitive load and knowledge behaviours, analysed using Epistemic Network Analysis.
Authors:Gloria Fernández-Nieto, Vanessa Echeverria, Yuheng Li, Yi-Shan Tsai, Lele Sha, Guanliang Chen, Dragan Gasevic, Zachari Swiecki
Abstract:
Knowledge Management is crucial for capturing and transferring expertise within universities, especially in high staff turnover contexts where expertise loss disrupts teaching. Documenting teachers' workflows is time-intensive and diverts experts from core responsibilities. Sequential Pattern Mining (SPM) leverages log data to identify expert workflows, offering an automated alternative to represent workflows but requiring transformation into intuitive formats for novice educators. This paper introduces Visual Process Representations (VPR), a design approach combining SPM, Knowledge Management processes, and storytelling techniques to convert expert log data into clear visualisations. We detail the design phases and report a study evaluating visual affordances (text lists vs. pictorial-style) and teachers' perceptions of four versions of the VPR with 160 higher teachers on Prolific. Results indicate improved task performance, usability, and engagement, particularly with enriched visuals, though process memorability and task time improvements were limited. The findings highlight VPR's potential to visualise workflows and support novice educators.
Authors:Md Sabbir Ahmed, Arafat Rahman, Mark Rucker, Laura E. Barnes
Abstract:
Social interactions are a fundamental part of daily life and play a critical role in well-being. As emerging technologies offer opportunities to unobtrusively monitor behavior, there is growing interest in using them to better understand social experiences. However, automatically detecting interactions, particularly via wearable devices, remains underexplored. Existing systems are often limited to controlled environments, constrained to in-person interactions, and rely on rigid assumptions such as the presence of two speakers within a fixed time window. These limitations reduce their generalizability to capture diverse real-world interactions. To address these challenges, we developed a real-time, on-watch system capable of detecting both in-person and virtual interactions. The system leverages transfer learning to detect foreground speech (FS) and infers interaction boundaries based upon FS and conversational cues like whispering. In a real-world evaluation involving 11 participants over a total of 38 days (Mean = 3.45 days, SD = 2.73), the system achieved an interaction detection accuracy of 73.18%. Follow-up with six participants indicated perfect recall for detecting interactions. These preliminary findings demonstrate the potential of our system to capture interactions in daily life, providing a foundation for applications such as personalized interventions targeting social anxiety.
Authors:Ruei-Che Chang, Rosiana Natalie, Wenqian Xu, Jovan Zheng Feng Yap, Anhong Guo
Abstract:
Recent advancements in large multimodal models have provided blind or visually impaired (BVI) individuals with new capabilities to interpret and engage with the real world through interactive systems that utilize live video feeds. However, the potential benefits and challenges of such capabilities to support diverse real-world assistive tasks remain unclear. In this paper, we present findings from an exploratory study with eight BVI participants. Participants used ChatGPT's Advanced Voice with Video, a state-of-the-art live video AI released in late 2024, in various real-world scenarios, from locating objects to recognizing visual landmarks, across unfamiliar indoor and outdoor environments. Our findings indicate that current live video AI effectively provides guidance and answers for static visual scenes but falls short in delivering essential live descriptions required in dynamic situations. Despite inaccuracies in spatial and distance information, participants leveraged the provided visual information to supplement their mobility strategies. Although the system was perceived as human-like due to high-quality voice interactions, assumptions about users' visual abilities, hallucinations, generic responses, and a tendency towards sycophancy led to confusion, distrust, and potential risks for BVI users. Based on the results, we discuss implications for assistive video AI agents, including incorporating additional sensing capabilities for real-world use, determining appropriate intervention timing beyond turn-taking interactions, and addressing ecological and safety concerns.
Authors:Ada Yi Zhao, Aditya Gunturu, Ellen Yi-Luen Do, Ryo Suzuki
Abstract:
Large language models (LLMs) have enabled the automatic generation of step-by-step augmented reality (AR) instructions for a wide range of physical tasks. However, existing LLM-based AR guidance often lacks rich visual augmentations to effectively embed instructions into spatial context for a better user understanding. We present Guided Reality, a fully automated AR system that generates embedded and dynamic visual guidance based on step-by-step instructions. Our system integrates LLMs and vision models to: 1) generate multi-step instructions from user queries, 2) identify appropriate types of visual guidance, 3) extract spatial information about key interaction points in the real world, and 4) embed visual guidance in physical space to support task execution. Drawing from a corpus of user manuals, we define five categories of visual guidance and propose an identification strategy based on the current step. We evaluate the system through a user study (N=16), completing real-world tasks and exploring the system in the wild. Additionally, four instructors shared insights on how Guided Reality could be integrated into their training workflows.
Authors:Iyad Rahwan, Azim Shariff, Jean-François Bonnefon
Abstract:
Predicting the social and behavioral impact of future technologies, before they are achieved, would allow us to guide their development and regulation before these impacts get entrenched. Traditionally, this prediction has relied on qualitative, narrative methods. Here we describe a method which uses experimental methods to simulate future technologies, and collect quantitative measures of the attitudes and behaviors of participants assigned to controlled variations of the future. We call this method 'science fiction science'. We suggest that the reason why this method has not been fully embraced yet, despite its potential benefits, is that experimental scientists may be reluctant to engage in work facing such serious validity threats as science fiction science. To address these threats, we consider possible constraints on the kind of technology that science fiction science may study, as well as the unconventional, immersive methods that science fiction science may require. We seek to provide perspective on the reasons why this method has been marginalized for so long, what benefits it would bring if it could be built on strong yet unusual methods, and how we can normalize these methods to help the diverse community of science fiction scientists to engage in a virtuous cycle of validity improvement.
Authors:Sanchita S. Kamath, Omar Khan, Anurag Choudhary, Jan Meyerhoff-Liang, Soyoung Choi, JooYoung Seo
Abstract:
Blind and low-vision (BLV) individuals experience lower levels of physical activity (PA) compared to sighted peers due to a lack of accessible, engaging exercise options. Existing solutions often rely on auditory cues but do not fully integrate rich sensory feedback or support spatial navigation, limiting their effectiveness. This study introduces PunchPulse, a virtual reality (VR) boxing exergame designed to motivate BLV users to reach and sustain moderate to vigorous physical activity (MVPA) levels. Over a seven-month, multi-phased study, PunchPulse was iteratively refined with three BLV co-designers, informed by two early pilot testers, and evaluated by six additional BLV user-study participants. Data collection included both qualitative (researcher observations, SOPI) and quantitative (MVPA zones, aid usage, completion times) measures of physical exertion and gameplay performance. The user study revealed that all participants reached moderate MVPA thresholds, with high levels of immersion and engagement observed. This work demonstrates the potential of VR as an inclusive medium for promoting meaningful PA in the BLV community and addresses a critical gap in accessible, intensity-driven exercise interventions.
Authors:Antrea Christou, Cogan Shimizu
Abstract:
Navigating, visualizing, and discovery in graph data is frequently a difficult prospect. This is especially true for knowledge graphs (KGs), due to high number of possible labeled connections to other data. However, KGs are frequently equipped with an ontology as a schema. That is, it informs how the relationships between data may be constrained. This additional information can be leveraged to improve how (knowledge) graph data can be navigated, visualized, or otherwise utilized in a discovery process. In this manuscript, we introduce the Interactive Knowledge (InK) Browser. This tool specifically takes advantage ontological information (i.e., knowledge) when found in KGs. Specifically, we use modular views that provide various perspectives over the graph, including an interactive schema view, data listings based on type, neighborhood connections, and geospatial depiction (where appropriate). For this manuscript, we have evaluated the basic premise of this tool over a user group ($n= With this grown user survey, we continue to evaluate how scalable tools, including flexible views, can make KG exploration easier for a range of applications.)
Authors:Jiawei Li, Linjie Qiu, Zhiqing Wu, Qiongyan Chen, Ziyan Wang, Mingming Fan
Abstract:
Older adults tend to encounter challenges when learning to use new smartphone apps due to age-related cognitive and physical changes. Compared to traditional support methods such as video tutorials, trial-and-error allows older adults to learn to use smartphone apps by making and correcting mistakes. However, it remains unknown how trial-and-error should be designed to empower older adults to use smartphone apps and how well it would work for older adults. Informed by the guidelines derived from prior work, we designed and implemented ExplorAR, an AR-based trial-and-error system that offers real-time and situated visual guidance in the augmented space around the smartphone to empower older adults to explore and correct mistakes independently. We conducted a user study with 18 older adults to compare ExplorAR with traditional video tutorials and a simplified version of ExplorAR. Results show that the AR-supported trial-and-error method enhanced older adults' learning experience by fostering deeper cognitive engagement and improving confidence in exploring unknown operations.
Authors:Jiajun Zhu, Xinyu Cheng, Zhongsu Luo, Yunfan Zhou, Xinhuan Shu, Di Weng, Yingcai Wu
Abstract:
Large language models (LLMs) enable the rapid generation of data wrangling scripts based on natural language instructions, but these scripts may not fully adhere to user-specified requirements, necessitating careful inspection and iterative refinement. Existing approaches primarily assist users in understanding script logic and spotting potential issues themselves, rather than providing direct validation of correctness. To enhance debugging efficiency and optimize the user experience, we develop ViseGPT, a tool that automatically extracts constraints from user prompts to generate comprehensive test cases for verifying script reliability. The test results are then transformed into a tailored Gantt chart, allowing users to intuitively assess alignment with semantic requirements and iteratively refine their scripts. Our design decisions are informed by a formative study (N=8) that explores user practices and challenges. We further evaluate the effectiveness and usability of ViseGPT through a user study (N=18). Results indicate that ViseGPT significantly improves debugging efficiency for LLM-generated data-wrangling scripts, enhances users' ability to detect and correct issues, and streamlines the workflow experience.
Authors:Maryam Cheema, Sina Elahimanesh, Samuel Martin, Pooyan Fazli, Hasti Seifi
Abstract:
Audio description (AD) makes video content accessible to millions of blind and low vision (BLV) users. However, creating high-quality AD involves a trade-off between the precision of human-crafted descriptions and the efficiency of AI-generated ones. To address this, we present DescribePro a collaborative AD authoring system that enables describers to iteratively refine AI-generated descriptions through multimodal large language model prompting and manual editing. DescribePro also supports community collaboration by allowing users to fork and edit existing ADs, enabling the exploration of different narrative styles. We evaluate DescribePro with 18 describers (9 professionals and 9 novices) using quantitative and qualitative methods. Results show that AI support reduces repetitive work while helping professionals preserve their stylistic choices and easing the cognitive load for novices. Collaborative tags and variations show potential for providing customizations, version control, and training new describers. These findings highlight the potential of collaborative, AI-assisted tools to enhance and scale AD authorship.
Authors:Subin Park, Jeonghyun Kim, Jeanne Choi, Joseph Seering, Uichin Lee, Sung-Ju Lee
Abstract:
Hate speech remains a persistent and unresolved challenge in online platforms. Content moderators, working on the front lines to review user-generated content and shield viewers from hate speech, often find themselves unprotected from the mental burden as they continuously engage with offensive language. To safeguard moderators' mental well-being, we designed HateBuffer, which anonymizes targets of hate speech, paraphrases offensive expressions into less offensive forms, and shows the original expressions when moderators opt to see them. Our user study with 80 participants consisted of a simulated hate speech moderation task set on a fictional news platform, followed by semi-structured interviews. Although participants rated the hate severity of comments lower while using HateBuffer, contrary to our expectations, they did not experience improved emotion or reduced fatigue compared with the control group. In interviews, however, participants described HateBuffer as an effective buffer against emotional contagion and the normalization of biased opinions in hate speech. Notably, HateBuffer did not compromise moderation accuracy and even contributed to a slight increase in recall. We explore possible explanations for the discrepancy between the perceived benefits of HateBuffer and its measured impact on mental well-being. We also underscore the promise of text-based content modification techniques as tools for a healthier content moderation environment.
Authors:Dominique Geissler, Claire Robertson, Stefan Feuerriegel
Abstract:
Deepfakes, i.e., images generated by artificial intelligence (AI), can erode trust in institutions and compromise election outcomes, as people often struggle to discern real images from deepfakes. Improving digital literacy can help address these challenges, yet scalable and effective approaches remain largely unexplored. Here, we compare the efficacy of five digital literacy interventions to boost people's ability to discern deepfakes: (1) textual guidance on common indicators of deepfakes; (2) visual demonstrations of these indicators; (3) a gamified exercise for identifying deepfakes; (4) implicit learning through repeated exposure and feedback; and (5) explanations of how deepfakes are generated with the help of AI. We conducted an experiment with N=1,200 participants from the United States to test the immediate and long-term effectiveness of our interventions. Our results show that our interventions can boost deepfake discernment by up to 13 percentage points while maintaining trust in real images. Altogether, our approach is scalable, suitable for diverse populations, and highly effective for boosting deepfake detection while maintaining trust in truthful information.
Authors:Tom Sheffer, Alon Miron, Yaniv Dover, Ariel Goldstein
Abstract:
Conversations transform individual knowledge into collective insight, allowing groups of humans and increasingly groups of artificial intelligence (AI) agents to collaboratively solve complex problems. Whether interactions between AI agents can replicate the synergy observed in human discussions remains an open question. To investigate this, we systematically compared four conversational configurations: pairs of large language models (LLM-LLM), trios of LLMs, trios of humans, and mixed human-LLM pairs. After agents answered questions individually, they engaged in open-ended discussions and then reconsidered their initial answers. Interactions involving humans consistently led to accuracy improvements after the conversations, benefiting both stronger and weaker participants. By contrast, purely LLM-based pairs and trios exhibited declines in accuracy, demonstrating limited conversational synergy. Analysis of participants' confidence and answer-switching behavior revealed that knowledge diversity is a critical factor enabling collaborative improvement. Crucially, the lack of gains in LLM-LLM interactions did not stem from a fundamental limitation of the models' ability to collaborate, but from highly similar knowledge states that left little room for productive exchange. Our findings argue for a paradigm shift in AI development: rather than optimizing individual models solely for standalone performance, explicitly cultivating diversity across agents, even at the cost of slightly lower individual accuracy, may yield AI collaborators that are more effective in group settings with humans or other AI systems.
Authors:Victoria Chang, Caro Williams-Pierce, Huaishu Peng, Ge Gao
Abstract:
Opportunistic interactions-the unstructured exchanges that emerge as individuals become aware of each other's presence-are essential for relationship building and information sharing in everyday life. Yet, fostering effective opportunistic interactions has proven challenging, especially at professional events that have increasingly transitioned from in person to online formats. In the current paper, we offer an in-depth qualitative account of how people initiate opportunistic interactions in social VR. Our participants consisted of 16 individuals with ongoing experience attending VR-mediated events in their professional communities. We conducted extensive observations with each participant during one or more events they attended. We also interviewed them after every observed event, obtaining self-reflections on their attempts to navigate opportunistic interactions with others. Our analysis revealed that participants sought to understand the extent to which social VR preserved the real-world meanings of various nonverbal cues, which we refer to as verisimilitude. We detailed the unique connections between a person's perceived verisimilitude and their social behaviors at each of the three steps toward initiating opportunistic interactions: availability recognition, attention capture, and ice-breaking. Across these steps, the VR platform typically replaces complex social mechanisms with feasible technical ones in order to function, thereby altering the preconditions necessary for a nonverbal cue's social meanings to remain intact. We identified a rich set of strategies that participants developed to assess verisimilitude and act upon it, while also confirming a lack of systematic knowledge guiding their practices. Based on these findings, we provide actionable insights for social VR platform design that can best support the initiation of opportunistic interactions for professional purposes.
Authors:Cynthia Zastudil, Christine Holyfield, Christine Kapp, Kate Hamilton, Kriti Baru, Liam Newsam, June A. Smith, Stephen MacNeil
Abstract:
Augmentative and alternative communication (AAC) devices are used by many people around the world who experience difficulties in communicating verbally. One AAC device which is especially useful for minimally verbal autistic children in developing language and communication skills are visual scene displays (VSD). VSDs use images with interactive hotspots embedded in them to directly connect language to real-world contexts which are meaningful to the AAC user. While VSDs can effectively support emergent communicators, their widespread adoption is impacted by how difficult these devices are to configure. We developed a prototype that uses generative AI to automatically suggest initial hotspots on an image to help non-experts efficiently create VSDs. We conducted a within-subjects user study to understand how effective our prototype is in supporting non-expert users, specifically pre-service speech-language pathologists (SLP) who are not familiar with VSDs as an AAC intervention. Pre-service SLPs are actively studying to become clinically certified SLPs and have domain-specific knowledge about language and communication skill development. We evaluated the effectiveness of our prototype based on creation time, quality, and user confidence. We also analyzed the relevance and developmental appropriateness of the automatically generated hotspots and how often users interacted with the generated hotspots. Our results were mixed with SLPs becoming more efficient and confident. However, there were multiple negative impacts as well, including over-reliance and homogenization of communication options. The implications of these findings reach beyond the domain of AAC, especially as generative AI becomes more prevalent across domains, including assistive technology. Future work is needed to further identify and address these risks associated with integrating generative AI into assistive technology.
Authors:Kentaro Takahira, Yue Yu, Takanori Fujiwara, Ryo Suzuki, Huamin Qu
Abstract:
Augmented data storytelling enhances narrative delivery by integrating visualizations with physical environments and presenter actions. Existing systems predominantly rely on body gestures or speech to control visualizations, leaving interactions with physical objects largely underexplored. We introduce augmented physical data storytelling, an approach enabling presenters to manipulate visualizations through physical object interactions. To inform this approach, we first conducted a survey of data-driven presentations to identify common visualization commands. We then conducted workshops with nine HCI/VIS researchers to collect mappings between physical manipulations and these commands. Guided by these insights, we developed InSituTale, a prototype that combines object tracking via a depth camera with Vision-LLM for detecting real-world events. Through physical manipulations, presenters can dynamically execute various visualization commands, delivering cohesive data storytelling experiences that blend physical and digital elements. A user study with 12 participants demonstrated that InSituTale enables intuitive interactions, offers high utility, and facilitates an engaging presentation experience.
Authors:Barbara Karpowicz, Tomasz Kowalewski, Pavlo Zinevych, Adam KuzdraliÅski, Grzegorz Marcin Wójcik, WiesÅaw KopeÄ
Abstract:
This paper proposes an eye tracking module for the XR Space Framework aimed at enhancing human performance in XR-based applications, specifically in training, screening, and teleoperation. This framework provides a methodology and components that streamline the development of adaptive real-time virtual immersive systems. It contains multimodal measurements - declarative in the form of in-VR questionnaires and objective, including eye tracking, body movement, and psychophysiological data (e.g., ECG, GSR, PPG). A key focus of this paper is the integration of real-time eye tracking data into XR environments to facilitate a biofeedback loop, providing insight into user attention, cognitive load, and engagement. Given the relatively high measurement frequency of eye tracking - recognized as a noninvasive yet robust psychophysiological measure - this technology is particularly well suited for real-time adjustments in task difficulty and feedback to enhance learning and operational effectiveness. Despite its established role in cognitive and attentional studies, implementing eye tracking metrics within dynamic, real-time XR environments poses unique challenges, particularly given the complex moving visuals presented in head-mounted displays (HMDs). This paper addresses these challenges by focusing on the essential aspects of integrating eye tracking in immersive systems based on real-time engines, ultimately facilitating more efficient, adaptive XR applications.
Authors:Kian Mahmoodi, Yudong Xie, Tan Gemicioglu, Chi-Jung Lee, Jiwan Kim, Cheng Zhang
Abstract:
Grip force is commonly used as an overall health indicator in older adults and is valuable for tracking progress in physical training and rehabilitation. Existing methods for wearable grip force measurement are cumbersome and user-dependent, making them insufficient for practical, continuous grip force measurement. We introduce EchoForce, a novel wristband using acoustic sensing for low-cost, non-contact measurement of grip force. EchoForce captures acoustic signals reflected from subtle skin deformations by flexor muscles on the forearm. In a user study with 11 participants, EchoForce achieved a fine-tuned user-dependent mean error rate of 9.08% and a user-independent mean error rate of 12.3% using a foundation model. Our system remained accurate between sessions, hand orientations, and users, overcoming a significant limitation of past force sensing systems. EchoForce makes continuous grip force measurement practical, providing an effective tool for health monitoring and novel interaction techniques.
Authors:Panayu Keelawat, David Barron, Kaushik Narasimhan, Daniel Manesh, Xiaohang Tang, Xi Chen, Sang Won Lee, Yan Chen
Abstract:
Facilitating class-wide debriefings after small-group discussions is a common strategy in ethics education. Instructor interviews revealed that effective debriefings should highlight frequently discussed themes and surface underrepresented viewpoints, making accurate representations of insight occurrence essential. Yet authoring presentations in real time is cognitively overwhelming due to the volume of data and tight time constraints. We present Dynamite, an AI-assisted system that enables semantic updates to instructor-authored slides during live classroom discussions. These updates are powered by semantic data binding, which links slide content to evolving discussion data, and semantic suggestions, which offer revision options aligned with pedagogical goals. In a within-subject in-lab study with 12 participants, Dynamite outperformed a text-based AI baseline in content accuracy and quality. Participants used voice and sketch input to quickly organize semantic blocks, then applied suggestions to accelerate refinement as data stabilized.
Authors:Aliaksandr Marozau, Barbara Karpowicz, Tomasz Kowalewski, Pavlo Zinevych, Wiktor Stawski, Adam KuzdraliÅski, WiesÅaw KopeÄ
Abstract:
Mixed Reality (MR) technologies such as Virtual and Augmented Reality (VR, AR) are well established in medical practice, enhancing diagnostics, treatment, and education. However, there are still some limitations and challenges that may be overcome thanks to the latest generations of equipment, software, and frameworks based on eXtended Reality (XR) by enabling immersive systems that support safer, more controlled environments for training and patient care. Our review highlights recent VR and AR applications in key areas of medicine. In medical education, these technologies provide realistic clinical simulations, improving skills and knowledge retention. In surgery, immersive tools enhance procedural precision with detailed anatomical visualizations. VR-based rehabilitation has shown effectiveness in restoring motor functions and balance, particularly for neurological patients. In mental health, VR has been successful in treating conditions like PTSD and phobias. Although VR and AR solutions are well established, there are still some important limitations, including high costs and limited tactile feedback, which may be overcome with implementing new technologies that may improve the effectiveness of immersive medical applications such as XR, psychophysiological feedback or integration of artificial intelligence (AI) for real-time data analysis and personalized healthcare and training.
Authors:Eshta Bhardwaj, Han Qiao, Christoph Becker
Abstract:
Policy decisions relevant to the environment rely on tools like dashboards, risk models, and prediction models to provide information and data visualizations that enable decision-makers to make trade-offs. The conventional paradigm of data visualization practices for policy and decision-making is to convey data in a supposedly neutral, objective manner for rational decision-makers. Feminist critique advocates for nuanced and reflexive approaches that take into account situated decision-makers and their affective relationships to data. This paper sheds light on a key cognitive aspect that impacts how decision-makers interpret data. Because all outcomes from policies relevant to climate change occur at a distance, decision-makers experience so-called `psychological distance' to environmental decisions in terms of space, time, social identity, and hypotheticality. This profoundly impacts how they perceive and evaluate outcomes. Since policy decisions to achieve a safe planetary space are urgently needed for immediate transition and change, we need a design practice that takes into account how psychological distance affects cognition and decision-making. Our paper explores the role of alternative design approaches in developing visualizations used for climate policymaking. We conduct a literature review and synthesis which bridges psychological distance with speculative design and data visceralization by illustrating the value of affective design methods via examples from previous research. Through this work, we propose a novel premise for the communication and visualization of environmental data. Our paper lays out how future research on the impacts of alternative design approaches on psychological distance can make data used for policy decisions more tangible and visceral.
Authors:Luca-Maxim Meinhardt, Enrico Rukzio
Abstract:
Highly Automated Vehicles (HAVs) can improve mobility for blind and visually impaired people (BVIPs). However, designing non-visual interfaces that enable them to maintain situation awareness inside the vehicle is a challenge. This paper presents two of our participatory design workshops that explored what information BVIPs need in HAVs and what an interface that meets these needs might look like. Based on the participants' insights, we created final systems to improve their situation awareness. The two workshops used different approaches: in the first, participants built their own low-fidelity prototypes; in the second, they evaluated and discussed the initial prototypes we provided. We will outline how each workshop was set up and share lessons learned about prototyping methods for BVIPs and how they could be improved.
Authors:Victoria Chang, Caro Williams-Pierce, Huaishu Peng, Ge Gao
Abstract:
Social VR introduces new ethical challenges for observational research. The current paper presents a narrative literature review of ethical considerations in observational methods, with a focus on work in HCI. We examine how unobtrusive or selectively disclosed observation is implemented in public face-to-face and social VR settings. Our review extends ethical discussions from traditional public research into the context of social VR, highlighting tensions between observer visibility, data traceability, and participant autonomy. Drawing on insights distilled from prior literature, we propose five constructive guidelines for ethical observational research in public social VR environments. Our work offers key implications for future research, addressing anticipated improvements in platform design, the management of researcher presence, and the development of community-informed consent mechanisms.
Authors:Danqing Shi, Furui Cheng, Tino Weinkauf, Antti Oulasvirta, Mennatallah El-Assady
Abstract:
Human preferences are widely used to align large language models (LLMs) through methods such as reinforcement learning from human feedback (RLHF). However, the current user interfaces require annotators to compare text paragraphs, which is cognitively challenging when the texts are long or unfamiliar. This paper contributes by studying the decomposition principle as an approach to improving the quality of human feedback for LLM alignment. This approach breaks down the text into individual claims instead of directly comparing two long-form text responses. Based on the principle, we build a novel user interface DxHF. It enhances the comparison process by showing decomposed claims, visually encoding the relevance of claims to the conversation and linking similar claims. This allows users to skim through key information and identify differences for better and quicker judgment. Our technical evaluation shows evidence that decomposition generally improves feedback accuracy regarding the ground truth, particularly for users with uncertainty. A crowdsourcing study with 160 participants indicates that using DxHF improves feedback accuracy by an average of 5%, although it increases the average feedback time by 18 seconds. Notably, accuracy is significantly higher in situations where users have less certainty. The finding of the study highlights the potential of HCI as an effective method for improving human-AI alignment.
Authors:Maciej Skorski, Alina Landowska
Abstract:
Moral foundation detection is crucial for analyzing social discourse and developing ethically-aligned AI systems. While large language models excel across diverse tasks, their performance on specialized moral reasoning remains unclear.
This study provides the first comprehensive comparison between state-of-the-art LLMs and fine-tuned transformers across Twitter and Reddit datasets using ROC, PR, and DET curve analysis.
Results reveal substantial performance gaps, with LLMs exhibiting high false negative rates and systematic under-detection of moral content despite prompt engineering efforts. These findings demonstrate that task-specific fine-tuning remains superior to prompting for moral reasoning applications.
Authors:Benjamin Watson, Neff Walker, William Ribarsky, Victoria Spaulding
Abstract:
System responsiveness (SR) is defined as the elapsed time until a system responds to user control. SR fluctuates over time, so it must be described statistically with mean (MSR) and standard deviation (SDSR). In this paper, we examine SR in virtual environments (VEs), outlining its components and methods of experimental measurement and manipulation. Three studies of MSR and SDSR effects on performance of grasp and placement tasks are then presented. The studies used within-subjects designs with 11, 12, and 10 participants, respectively. Results showed that SDSR affected performance only if it was above 82 ms. Placement required more frequent visual feedback and was more sensitive to SR. We infer that VE designers need not tightly control SDSR and may wish to vary SR control based on required visual feedback frequency. These results may be used to improve the human-computer interface in a wide range of interactive graphical applications, including scientific visualization, training, mental health, and entertainment.
Authors:Gege Ming, Weihua Pei, Sen Tian, Xiaogang Chen, Xiaorong Gao, Yijun Wang
Abstract:
Brain-computer interface (BCI) technology establishes a direct communication pathway between the brain and external devices. Current visual BCI systems suffer from insufficient information transfer rates (ITRs) for practical use. Spatial information, a critical component of visual perception, remains underexploited in existing systems because the limited spatial resolution of recording methods hinders the capture of the rich spatiotemporal dynamics of brain signals. This study proposed a frequency-phase-space fusion encoding method, integrated with 256-channel high-density electroencephalogram (EEG) recordings, to develop high-speed BCI systems. In the classical frequency-phase encoding 40-target BCI paradigm, the 256-66, 128-32, and 64-21 electrode configurations brought theoretical ITR increases of 83.66%, 79.99%, and 55.50% over the traditional 64-9 setup. In the proposed frequency-phase-space encoding 200-target BCI paradigm, these increases climbed to 195.56%, 153.08%, and 103.07%. The online BCI system achieved an average actual ITR of 472.7 bpm. This study demonstrates the essential role and immense potential of high-density EEG in decoding the spatiotemporal information of visual stimuli.
Authors:Yang Ouyang, Yuchen Wu, Xiyuan Wang, Laixin Xie, Weicong Cheng, Jianping Gan, Quan Li, Xiaojuan Ma
Abstract:
Communicating the complexity of oceanic phenomena-such as hypoxia and acidification-poses a persistent challenge for marine science. Despite advances in sensing technologies and computational models, conventional formats like static visualizations and text-based reports often fall short in conveying the dynamics of ocean changes. To address this gap, we present OceanVive, an immersive and interactive visualization system that transforms complex ocean datasets into navigable spatial narratives. OceanVive incorporates an exploratory panel on a table-sized tablet for managing immersive content on a large screen and integrates adaptive visual encodings, contextual storytelling, and intuitive navigation pathways to support effective communication. We validate the system through expert interviews, demonstrating its potential to enhance science communication and promote deeper public understanding.
Authors:Benjamin Watson, Victoria Spaulding, Neff Walker, William Ribarsky
Abstract:
We present a first study of the effects of frame time variations, in both deviation around mean frame times and period of fluctuation, on task performance in a virtual environment (VE). Chosen are open and closed loop tasks that are typical for current applications or likely to be prominent in future ones. The results show that at frame times in the range deemed acceptable for many applications, fairly large deviations in amplitude over a fairly wide range of periods do not significantly affect task performance. However, at a frame time often considered a minimum for immersive VR, frame time variations do produce significant effects on closed loop task performance. The results will be of use to designers of VEs and immersive applications, who often must control frame time variations due to large fluctuations of complexity (graphical and otherwise) in the VE.
Authors:Katelyn Morrison, Arpit Mathur, Aidan Bradshaw, Tom Wartmann, Steven Lundi, Afrooz Zandifar, Weichang Dai, Kayhan Batmanghelich, Motahhare Eslami, Adam Perer
Abstract:
As text-to-image generative models rapidly improve, AI researchers are making significant advances in developing domain-specific models capable of generating complex medical imagery from text prompts. Despite this, these technical advancements have overlooked whether and how medical professionals would benefit from and use text-to-image generative AI (GenAI) in practice. By developing domain-specific GenAI without involving stakeholders, we risk the potential of building models that are either not useful or even more harmful than helpful. In this paper, we adopt a human-centered approach to responsible model development by involving stakeholders in evaluating and reflecting on the promises, risks, and challenges of a novel text-to-CT Scan GenAI model. Through exploratory model prompting activities, we uncover the perspectives of medical students, radiology trainees, and radiologists on the role that text-to-CT Scan GenAI can play across medical education, training, and practice. This human-centered approach additionally enabled us to surface technical challenges and domain-specific risks of generating synthetic medical images. We conclude by reflecting on the implications of medical text-to-image GenAI.
Authors:Adrien Coppens, Valérie Maquil
Abstract:
To understand and quantify the quality of mixed-presence collaboration around wall-sized displays, robust evaluation methodologies are needed, that are adapted for a room-sized experience and are not perceived as obtrusive. In this paper, we propose our approach for measuring joint attention based on head gaze data. We describe how it has been implemented for a user study on mixed presence collaboration with two wall-sized displays and report on the insights we gained so far from its implementation, with a preliminary focus on the data coming from one particular session.
Authors:Lou Schwartz, Mohammad Ghoniem, Valérie Maquil, Adrien Coppens, Johannes Hermen
Abstract:
Wall-Sized Displays have spatial characteristics that are difficult to address during user interface design. The design at scale 1:1 could be part of the solution. In this paper, we present the results of two user studies and one technology review, exploring the usability of popular, desktop-optimized prototyping tools, for designing at scale on Wall-Sized Displays. We considered two wall-sized display setups, and three different interaction methods: touch, a keyboard equipped with a touchpad, and a tablet. We observed that designing at scale 1:1 was appreciated. Tablet-based interaction proved to be the most comfortable interaction method, and a mix of interaction modalities is promising. In addition, care must be given to the surrounding environment, such as furniture. We propose twelve design guidelines for a design tool dedicated to this specific context. Overall, existing user interface design tools do not yet fully support design on and for wall-sized displays and require further considerations in terms of placement of user interface elements and the provision of additional features.
Authors:Nuwan Janaka, Shengdong Zhao, Ashwin Ram, Ruoxin Sun, Sherisse Tan Jing Wen, Danae Li, David Hsu
Abstract:
The rapid evolution of lightweight consumer augmented reality (AR) smart glasses (a.k.a. optical see-through head-mounted displays) offers novel opportunities for learning, particularly through their unique capability to deliver multimodal information in just-in-time, micro-learning scenarios. This research investigates how such devices can support mobile second-language acquisition by presenting progressive sentence structures in multimodal formats. In contrast to the commonly used vocabulary (i.e., word) learning approach for novice learners, we present a "progressive presentation" method that combines both word and sentence learning by sequentially displaying sentence components (subject, verb, object) while retaining prior context. Pilot and formal studies revealed that progressive presentation enhances recall, particularly in mobile scenarios such as walking. Additionally, incorporating timed gaps between word presentations further improved learning effectiveness under multitasking conditions. Our findings demonstrate the utility of progressive presentation and provide usage guidelines for educational applications-even during brief, on-the-go learning moments.
Authors:Benjamin Watson, Neff Walker, Larry F Hodges, Aileen Worden
Abstract:
Two user studies were performed to evaluate the effect of level-of-detail (LOD) degradation in the periphery of head-mounted displays on visual search performance. In the first study, spatial detail was degraded by reducing resolution. In the second study, detail was degraded in the color domain by using grayscale in the periphery. In each study, 10 subjects were given a complex search task that required users to indicate whether or not a target object was present among distracters. Subjects used several different displays varying in the amount of detail presented. Frame rate, object location, subject input method, and order of display use were all controlled. The primary dependent measures were search time on correctly performed trials and the percentage of all trials correctly performed. Results indicated that peripheral LOD degradation can be used to reduce color or spatial visual complexity by almost half in some search tasks with out significantly reducing performance.
Authors:Hyo-Jeong Jang, Hye-Bin Shin, Seong-Whan Lee
Abstract:
Electroencephalography (EEG) is a fundamental modality for cognitive state monitoring in brain-computer interfaces (BCIs). However, it is highly susceptible to intrinsic signal errors and human-induced labeling errors, which lead to label noise and ultimately degrade model performance. To enhance EEG learning, multimodal knowledge distillation (KD) has been explored to transfer knowledge from visual models with rich representations to EEG-based models. Nevertheless, KD faces two key challenges: modality gap and soft label misalignment. The former arises from the heterogeneous nature of EEG and visual feature spaces, while the latter stems from label inconsistencies that create discrepancies between ground truth labels and distillation targets. This paper addresses semantic uncertainty caused by ambiguous features and weakly defined labels. We propose a novel cross-modal knowledge distillation framework that mitigates both modality and label inconsistencies. It aligns feature semantics through a prototype-based similarity module and introduces a task-specific distillation head to resolve label-induced inconsistency in supervision. Experimental results demonstrate that our approach improves EEG-based emotion regression and classification performance, outperforming both unimodal and multimodal baselines on a public multimodal dataset. These findings highlight the potential of our framework for BCI applications.
Authors:David Freire-Obregón, Agnieszka Dubiel, Prasoon Kumar Vinodkumar, Gholamreza Anbarjafari, Dorota KamiÅska, Modesto Castrillón-Santana
Abstract:
Recent advances have shown promise in emotion recognition from electroencephalogram (EEG) signals by employing bi-hemispheric neural architectures that incorporate neuroscientific priors into deep learning models. However, interpretability remains a significant limitation for their application in sensitive fields such as affective computing and cognitive modeling. In this work, we introduce a post-hoc interpretability framework tailored to dual-stream EEG classifiers, extending the Local Interpretable Model-Agnostic Explanations (LIME) approach to accommodate structured, bi-hemispheric inputs. Our method adapts LIME to handle structured two-branch inputs corresponding to left and right-hemisphere EEG channel groups. It decomposes prediction relevance into per-channel contributions across hemispheres and emotional classes. We apply this framework to a previously validated dual-branch recurrent neural network trained on EmoNeuroDB, a dataset of EEG recordings captured during a VR-based emotion elicitation task. The resulting explanations reveal emotion-specific hemispheric activation patterns consistent with known neurophysiological phenomena, such as frontal lateralization in joy and posterior asymmetry in sadness. Furthermore, we aggregate local explanations across samples to derive global channel importance profiles, enabling a neurophysiologically grounded interpretation of the model's decisions. Correlation analysis between symmetric electrodes further highlights the model's emotion-dependent lateralization behavior, supporting the functional asymmetries reported in affective neuroscience.
Authors:Hyein Hong, Sangbong Yoo, SeokHwan Choi, Jisue Kim, Seongbum Seo, Haneol Cho, Chansoo Kim, Yun Jang
Abstract:
Approaches to enhancing data quality (DQ) are classified into two main categories: data- and process-driven. However, prior research has predominantly utilized batch data preprocessing within the data-driven framework, which often proves insufficient for optimizing machine learning (ML) model performance and frequently leads to distortions in data characteristics. Existing studies have primarily focused on data preprocessing rather than genuine data quality improvement (DQI). In this paper, we introduce d-DQIVAR, a novel visual analytics system designed to facilitate DQI strategies aimed at improving ML model performance. Our system integrates visual analytics techniques that leverage both data-driven and process-driven approaches. Data-driven techniques tackle DQ issues such as imputation, outlier detection, deletion, format standardization, removal of duplicate records, and feature selection. Process-driven strategies encompass evaluating DQ and DQI procedures by considering DQ dimensions and ML model performance and applying the Kolmogorov-Smirnov test. We illustrate how our system empowers users to harness expert and domain knowledge effectively within a practical workflow through case studies, evaluations, and user studies.
Authors:Pouya Shaeri, Arash Karimi, Ariane Middel
Abstract:
Neural networks are often benchmarked using standard datasets such as MNIST, FashionMNIST, or other variants of MNIST, which, while accessible, are limited to generic classes such as digits or clothing items. For researchers working on domain-specific tasks, such as classifying trees, food items, or other real-world objects, these data sets are insufficient and irrelevant. Additionally, creating and publishing a custom dataset can be time consuming, legally constrained, or beyond the scope of individual projects. We present MNIST-Gen, an automated, modular, and adaptive framework for generating MNIST-style image datasets tailored to user-specified categories using hierarchical semantic categorization. The system combines CLIP-based semantic understanding with reinforcement learning and human feedback to achieve intelligent categorization with minimal manual intervention. Our hierarchical approach supports complex category structures with semantic characteristics, enabling fine-grained subcategorization and multiple processing modes: individual review for maximum control, smart batch processing for large datasets, and fast batch processing for rapid creation. Inspired by category theory, MNIST-Gen models each data transformation stage as a composable morphism, enhancing clarity, modularity, and extensibility. As proof of concept, we generate and benchmark two novel datasets-\textit{Tree-MNIST} and \textit{Food-MNIST}-demonstrating MNIST-Gen's utility for producing task-specific evaluation data while achieving 85\% automatic categorization accuracy and 80\% time savings compared to manual approaches.
Authors:Akriti Verma, Shama Islam, Valeh Moghaddam, Adnan Anwar
Abstract:
Online conversations are often interrupted by trolling, which causes emotional distress and conflict among users. Previous research has focused on moderating harmful content after it has been posted, but ways to manage emotions in real-time remain unexplored. This study suggests a comment queuing mechanism that delays comment publishing, encourages self-reflection, and reduces the impact of impulsive and toxic comments. To assess the efficacy of this approach, a mixed-method research design is used. An analysis of 15,000 user interactions on Reddit showed that this approach could reduce the spread of hate speech and anger by up to 15%, with only 4% of comments being delayed for about 47 seconds on average. We also surveyed users for feedback on the mechanism. The results showed that 93. 3\% of the participants thought that the queuing mechanism could help calm the discussions and showed interest in seeing it used on social media platforms. Furthermore, 83% believed it would reduce impulsive comments and balance the emotional tone in conversations. We found a strong link between users' typical emotional states while using social media and their perceptions of the delay, with calm users finding the mechanism helpful and frustrated users anticipating frustration.
Authors:Xiaohang Tang, Sam Wong, Zicheng He, Yalong Yang, Yan Chen
Abstract:
This paper introduces REVA, a human-AI system that expedites instructor review of voluminous AI-generated programming feedback by sequencing submissions to minimize cognitive context shifts and propagating instructor-driven revisions across semantically similar instances. REVA introduces a novel approach to human-AI collaboration in educational feedback by adaptively learning from instructors' attention in the review and revision process to continuously improve the feedback validation process. REVA's usefulness and effectiveness in improving feedback quality and the overall feedback review process were evaluated through a within-subjects lab study with 12 participants.
Authors:Chuxuan Zhang, Yasaman Etesam, Angelica Lim
Abstract:
We propose an approach to test embodied AI agents for interaction awareness and believability, particularly in scenarios where humans push them to their limits. Turing introduced the Imitation Game as a way to explore the question: "Can machines think?" The Total Turing Test later expanded this concept beyond purely verbal communication, incorporating perceptual and physical interaction. Building on this, we propose a new guiding question: "Can machines react?" and introduce the React to This (RTT) test for nonverbal behaviors, presenting results from an initial experiment.
Authors:Jiangkai Wu, Zhiyuan Ren, Liming Liu, Xinggong Zhang
Abstract:
AI Video Chat emerges as a new paradigm for Real-time Communication (RTC), where one peer is not a human, but a Multimodal Large Language Model (MLLM). This makes interaction between humans and AI more intuitive, as if chatting face-to-face with a real person. However, this poses significant challenges to latency, because the MLLM inference takes up most of the response time, leaving very little time for video streaming. Due to network uncertainty and instability, transmission latency becomes a critical bottleneck preventing AI from being like a real person. To address this, we propose Artic, an AI-oriented Real-time Communication framework, exploring the network requirement shift from "humans watching video" to "AI understanding video". To reduce bitrate dramatically while maintaining MLLM accuracy, we propose Context-Aware Video Streaming that recognizes the importance of each video region for chat and allocates bitrate almost exclusively to chat-important regions. To avoid packet retransmission, we propose Loss-Resilient Adaptive Frame Rate that leverages previous frames to substitute for lost/delayed frames while avoiding bitrate waste. To evaluate the impact of video streaming quality on MLLM accuracy, we build the first benchmark, named Degraded Video Understanding Benchmark (DeViBench). Finally, we discuss some open questions and ongoing solutions for AI Video Chat.
Authors:Zikun Deng, Jiabao Huang, Chenxi Ruan, Jialing Li, Shaowu Gao, Yi Cai
Abstract:
Spatial time series visualization offers scientific research pathways and analytical decision-making tools across various spatiotemporal domains. Despite many advanced methodologies, the seamless integration of temporal and spatial information remains a challenge. The space-time cube (STC) stands out as a promising approach for the synergistic presentation of spatial and temporal information, with successful applications across various spatiotemporal datasets. However, the STC is plagued by well-known issues such as visual occlusion and depth ambiguity, which are further exacerbated when dealing with large-scale spatial time series data. In this study, we introduce a novel technical framework termed VolumeSTCube, designed for continuous spatiotemporal phenomena. It first leverages the concept of the STC to transform discretely distributed spatial time series data into continuously volumetric data. Subsequently, volume rendering and surface rendering techniques are employed to visualize the transformed volumetric data. Volume rendering is utilized to mitigate visual occlusion, while surface rendering provides pattern details by enhanced lighting information. Lastly, we design interactions to facilitate the exploration and analysis from temporal, spatial, and spatiotemporal perspectives. VolumeSTCube is evaluated through a computational experiment, a real-world case study with one expert, and a controlled user study with twelve non-experts, compared against a baseline from prior work, showing its superiority and effectiveness in largescale spatial time series analysis.
Authors:Zikun Deng, Yuanbang Liu, Mingrui Zhu, Da Xiang, Haiyue Yu, Zicheng Su, Qinglong Lu, Tobias Schreck, Yi Cai
Abstract:
The design of urban road networks significantly influences traffic conditions, underscoring the importance of informed traffic planning. Traffic planning experts rely on specialized platforms to simulate traffic systems, assessing the efficacy of the road network across various states of modifications. Nevertheless, a prevailing issue persists: many existing traffic planning platforms exhibit inefficiencies in flexibly interacting with the road network's structure and attributes and intuitively comparing multiple states during the iterative planning process. This paper introduces TraSculptor, an interactive planning decision-making system. To develop TraSculptor, we identify and address two challenges: interactive modification of road networks and intuitive comparison of multiple network states. For the first challenge, we establish flexible interactions to enable experts to easily and directly modify the road network on the map. For the second challenge, we design a comparison view with a history tree of multiple states and a road-state matrix to facilitate intuitive comparison of road network states. To evaluate TraSculptor, we provided a usage scenario where the Braess's paradox was showcased, invited experts to perform a case study on the Sioux Falls network, and collected expert feedback through interviews.
Authors:Soobin Yim, Sangbong Yoo, Chanyoung Yoon, Chanyoung Jung, Chansoo Kim, Yun Jang, Ghulam Jilani Quadri
Abstract:
Accurate assessment of mental workload (MW) is crucial for understanding cognitive processes during visualization tasks. While EEG-based measures are emerging as promising alternatives to conventional assessment techniques, such as selfreport measures, studies examining consistency across these different methodologies are limited. In a preliminary study, we observed indications of potential discrepancies between EEGbased and self-reported MW measures. Motivated by these preliminary observations, our study further explores the discrepancies between EEG-based and self-reported MW assessment methods through an experiment involving visualization tasks. In the experiment, we employ two benchmark tasks: the Visualization Literacy Assessment Test (VLAT) and a Spatial Visualization (SV) task. EEG signals are recorded from participants using a 32-channel system at a sampling rate of 128 Hz during the visualization tasks. For each participant, MW is estimated using an EEG-based model built on a Graph Attention Network (GAT) architecture, and these estimates are compared with conventional MW measures to examine potential discrepancies. Our findings reveal notable discrepancies between task difficulty and EEG-based MW estimates, as well as between EEG-based and self-reported MW measures across varying task difficulty levels. Additionally, the observed patterns suggest the presence of unconscious cognitive effort that may not be captured by selfreport alone.
Authors:Luke Rivard, Sun Sun, Hongyu Guo, Wenhu Chen, Yuntian Deng
Abstract:
We introduce NeuralOS, a neural framework that simulates graphical user interfaces (GUIs) of operating systems by directly predicting screen frames in response to user inputs such as mouse movements, clicks, and keyboard events. NeuralOS combines a recurrent neural network (RNN), which tracks computer state, with a diffusion-based neural renderer that generates screen images. The model is trained on a large-scale dataset of Ubuntu XFCE recordings, which include both randomly generated interactions and realistic interactions produced by AI agents. Experiments show that NeuralOS successfully renders realistic GUI sequences, accurately captures mouse interactions, and reliably predicts state transitions like application launches. Although modeling fine-grained keyboard interactions precisely remains challenging, NeuralOS offers a step toward creating fully adaptive, generative neural interfaces for future human-computer interaction systems.
Authors:Md. Saif Hassan Onim, Travis S. Humble, Himanshu Thapliyal
Abstract:
We investigate the feasibility of inferring emotional states exclusively from physiological signals, thereby presenting a privacy-preserving alternative to conventional facial recognition techniques. We conduct a performance comparison of classical machine learning algorithms and hybrid quantum machine learning (QML) methods with a quantum kernel-based model. Our results indicate that the quantum-enhanced SVM surpasses classical counterparts in classification performance across all emotion categories, even when trained on limited datasets. The F1 scores over all classes are over 80% with around a maximum of 36% improvement in the recall values. The integration of wearable sensor data with quantum machine learning not only enhances accuracy and robustness but also facilitates unobtrusive emotion recognition. This methodology holds promise for populations with impaired communication abilities, such as individuals with Alzheimer's Disease and Related Dementias (ADRD) and veterans with Post-Traumatic Stress Disorder (PTSD). The findings establish an early foundation for passive emotional monitoring in clinical and assisted living conditions.
Authors:Federico Maria Cau, Giuseppe Desolda, Francesco Greco, Lucio Davide Spano, Luca Viganò
Abstract:
Phishing has become a prominent risk in modern cybersecurity, often used to bypass technological defences by exploiting predictable human behaviour. Warning dialogues are a standard mitigation measure, but the lack of explanatory clarity and static content limits their effectiveness. In this paper, we report on our research to assess the capacity of Large Language Models (LLMs) to generate clear, concise, and scalable explanations for phishing warnings. We carried out a large-scale between-subjects user study (N = 750) to compare the influence of warning dialogues supplemented with manually generated explanations against those generated by two LLMs, Claude 3.5 Sonnet and Llama 3.3 70B. We investigated two explanatory styles (feature-based and counterfactual) for their effects on behavioural metrics (click-through rate) and perceptual outcomes (e.g., trust, risk, clarity). The results indicate that well-constructed LLM-generated explanations can equal or surpass manually crafted explanations in reducing susceptibility to phishing; Claude-generated warnings exhibited particularly robust performance. Feature-based explanations were more effective for genuine phishing attempts, whereas counterfactual explanations diminished false-positive rates. Other variables such as workload, gender, and prior familiarity with warning dialogues significantly moderated warning effectiveness. These results indicate that LLMs can be used to automatically build explanations for warning users against phishing, and that such solutions are scalable, adaptive, and consistent with human-centred values.
Authors:Roberto Ulloa, Juhi Kulshrestha, Celina Kacperski
Abstract:
The rise of conversational AI (CAI), powered by large language models, is transforming how individuals access and interact with digital information. However, these tools may inadvertently amplify existing digital inequalities. This study investigates whether differences in formal education are associated with CAI avoidance, leveraging behavioral data from an online experiment (N = 1,636). Participants were randomly assigned to a control or an information-seeking task, either a traditional online search or a CAI (Perplexity AI). Task avoidance (operationalized as survey abandonment or providing unrelated responses during task assignment) was significantly higher in the CAI group (51%) compared to the search (30.9%) and control (16.8%) groups, with the highest CAI avoidance among participants with lower education levels (~74.4%). Structural equation modeling based on the theoretical framework UTAUT2 and LASSO regressions reveal that education is strongly associated with CAI avoidance, even after accounting for various cognitive and affective predictors of technology adoption. These findings underscore education's central role in shaping AI adoption and the role of self-selection biases in AI-related research, stressing the need for inclusive design to ensure equitable access to emerging technologies.
Authors:Matt Adams, Nick Tandavanitj, Steve Benford, Ayse Kucukyilmaz, Victor Ngo, Simon Castle-Green, Guido Salimberi, Pepita Bernard, Joel Fischer, Alan Chamberlain, Eike Schneiders, Clara Mancini
Abstract:
Cat Royale is an artwork created by the artists Blast Theory to explore the question of whether we should trust robots to care for our loved ones. The artists endeavoured to create a `Cat Utopia', a luxurious environment that was inhabited by a family of three cats for six hours a day for twelve days, at the centre of which a robot arm played with them by wielding toys. Behind the scenes, the decision engine recommended games based on ongoing assessment of their happiness. A video installation featuring an eight-hour movie of the cats' exploits is currently touring worldwide, provoking audiences to engage with the question of trust in autonomous systems.
Authors:Videep Venkatesha, Mariah Bradford, Nathaniel Blanchard
Abstract:
Collaborative Problem-Solving (CPS) markers capture key aspects of effective teamwork, such as staying on task, avoiding interruptions, and generating constructive ideas. An AI system that reliably detects these markers could help teachers identify when a group is struggling or demonstrating productive collaboration. Such a system requires an automated pipeline composed of multiple components. In this work, we evaluate how CPS detection is impacted by automating two critical components: transcription and speech segmentation. On the public Weights Task Dataset (WTD), we find CPS detection performance with automated transcription and segmentation methods is comparable to human-segmented and manually transcribed data; however, we find the automated segmentation methods reduces the number of utterances by 26.5%, impacting the the granularity of the data. We discuss the implications for developing AI-driven tools that support collaborative learning in classrooms.
Authors:Jan Kompatscher, Danqing Shi, Giovanna Varni, Tino Weinkauf, Antti Oulasvirta
Abstract:
Reinforcement learning from human feedback (RLHF) has emerged as a key enabling technology for aligning AI behavior with human preferences. The traditional way to collect data in RLHF is via pairwise comparisons: human raters are asked to indicate which one of two samples they prefer. We present an interactive visualization that better exploits the human visual ability to compare and explore whole groups of samples. The interface is comprised of two linked views: 1) an exploration view showing a contextual overview of all sampled behaviors organized in a hierarchical clustering structure; and 2) a comparison view displaying two selected groups of behaviors for user queries. Users can efficiently explore large sets of behaviors by iterating between these two views. Additionally, we devised an active learning approach suggesting groups for comparison. As shown by our evaluation in six simulated robotics tasks, our approach increases the final policy returns by 69.34%. It leads to lower error rates and better policies. We open-source the code that can be easily integrated into the RLHF training loop, supporting research on human-AI alignment.
Authors:Zhang Youpeng, Nuwan Janaka, Ashwin Ram, Yin Peilin, Tian Yang, Shengdong Zhao, Pierre Dragicevic
Abstract:
The rise of wearable smart devices raises unprecedented opportunities for self-improvement through ubiquitous behavior tracking and guidance. However, the design of effective wearable behavior intervention systems remains relatively unexplored. To address this gap, we conducted controlled studies focusing on the reduction of unwanted words (e.g., filler words, swear words) in daily communication through auditory feedback using wearable technology. We started with a design space exploration, considering various factors such as the type, duration, and timing of the auditory feedback. Then, we conducted pilot studies to reduce the space of design choices and prototyped a system called WSCoach (Wearable Speech Coach), which informs users when they utter unwanted words in near-real-time. To evaluate WSCoach, we compared it with a state-of-the-art mobile application supporting post-hoc conversation analysis. Both approaches were effective in reducing the occurrence of unwanted words, but WSCoach appears to be more effective in the long run. Finally, we discuss guidelines for the design of wearable audio-based behavior monitoring and intervention systems and highlight the potential of wearable technology for facilitating behavior correction and improvement. For supplementary material, please see the META Appendix and our OSF project at https://osf.io/6vhwn/?view_only=489498d3ac2d4703a17475fc6ca65dfa.
Authors:Zifei Wang, Emmanuel Abolarin, Kai Wu, Venkatarao Rebba, Jian Hu, Zhen Hu, Shan Bao, Feng Zhou
Abstract:
Electric vehicles (EVs) charging infrastructure is directly related to the overall EV user experience and thus impacts the widespread adoption of EVs. Understanding key factors that affect EV users' charging experience is essential for building a robust and user-friendly EV charging infrastructure. This study leverages about $17,000$ charging station (CS) reviews on Google Maps to explore EV user preferences for charging stations, employing ChatGPT 4.0 for aspect-based sentiment analysis. We identify twelve key aspects influencing user satisfaction, ranging from accessibility and reliability to amenities and pricing. Two distinct preference models are developed: a micro-level model focused on individual user satisfaction and a macro-level model capturing collective sentiment towards specific charging stations. Both models utilize the LightGBM algorithm for user preference prediction, achieving strong performance compared to other machine learning approaches. To further elucidate the impact of each aspect on user ratings, we employ SHAP (SHapley Additive exPlanations), a game-theoretic approach for interpreting machine learning models. Our findings highlight the significant impact of positive sentiment towards "amenities and location", coupled with negative sentiment regarding "reliability and maintenance", on overall user satisfaction. These insights offer actionable guidance to charging station operators, policymakers, and EV manufacturers, empowering them to enhance user experience and foster wider EV adoption.
Authors:Sifatul Anindho, Videep Venkatesha, Mariah Bradford, Anne M. Cleary, Nathaniel Blanchard
Abstract:
Collaborative problem solving (CPS) is a complex cognitive, social, and emotional process that is increasingly prevalent in educational and professional settings. This study investigates the emotional states of individuals during CPS using a mixed-methods approach. Teams of four first completed a novel CPS task. Immediately after, each individual was placed in an isolated room where they reviewed the video of their group performing the task and self-reported their internal experiences throughout the task. We performed a linguistic analysis of these internal monologues, providing insights into the range of emotions individuals experience during CPS. Our analysis showed distinct patterns in language use, including characteristic unigrams and bigrams, key words and phrases, emotion labels, and semantic similarity between emotion-related words.
Authors:Dorian Peters, Fernanda Espinoza, Marco da Re, Guido Ivetta, Luciana Benotti, Rafael A. Calvo
Abstract:
There is justifiable interest in leveraging conversational AI (CAI) for health across the majority world, but to be effective, CAI must respond appropriately within culturally and linguistically diverse contexts. Therefore, we need ways to address the fact that current LLMs exclude many lived experiences globally. Various advances are underway which focus on top-down approaches and increasing training data. In this paper, we aim to complement these with a bottom-up locally-grounded approach based on qualitative data collected during participatory workshops in Latin America. Our goal is to construct a rich and human-centred understanding of: a) potential areas of cultural misalignment in digital health; b) regional perspectives on chatbots for health and c)strategies for creating culturally-appropriate CAI; with a focus on the understudied Latin American context. Our findings show that academic boundaries on notions of culture lose meaning at the ground level and technologies will need to engage with a broader framework; one that encapsulates the way economics, politics, geography and local logistics are entangled in cultural experience. To this end, we introduce a framework for 'Pluriversal Conversational AI for Health' which allows for the possibility that more relationality and tolerance, rather than just more data, may be called for.
Authors:Conglin Ma, Jiatong Li, Sen-Zhe Xu, Ju Dai, Jie Liu, Feng Zhou
Abstract:
This paper introduces a novel VR-based system that redefines the acquisition of Hanzi character literacy by integrating traditional mortise-tenon joinery principles (HVRMT).Addressing the challenge of abstract character memorization in digital learning,our system deconstructs Hanzi components into interactive "structural radicals"akin to wooden joint modules.Leveraging PICO's 6DoF spatial tracking and LLM's morphological analysis,learners assemble stroke sequences with haptic feedback simulating wood-to-wood friction.Our system also supports multiplayer online experiences, enhancing engagement and memory retention while preserving intangible cultural heritage. This innovative approach not only enhances engagement and memory retention but also reconstructs the craft wisdom embedded in Chinese writing systems, offering new pathways for preserving intangible cultural heritage in digital ecosystems.For the demo,please refer to this link{https://youtu.be/oUwfFTRpFyo}.
Authors:Chuke Chen, Biao Luo, Nan Li, Boxiang Wang, Hang Yang, Jing Guo, Ming Xu
Abstract:
The rapid expansion of scientific data has widened the gap between analytical capability and research intent. Existing AI-based analysis tools, ranging from AutoML frameworks to agentic research assistants, either favor automation over transparency or depend on manual scripting that hinders scalability and reproducibility. We present ARIA (Automated Research Intelligence Assistant), a spec-driven, human-in-the-loop framework for automated and interpretable data analysis. ARIA integrates six interoperable layers, namely Command, Context, Code, Data, Orchestration, and AI Module, within a document-centric workflow that unifies human reasoning and machine execution. Through natural-language specifications, researchers define analytical goals while ARIA autonomously generates executable code, validates computations, and produces transparent documentation. Beyond achieving high predictive accuracy, ARIA can rapidly identify optimal feature sets and select suitable models, minimizing redundant tuning and repetitive experimentation. In the Boston Housing case, ARIA discovered 25 key features and determined XGBoost as the best performing model (R square = 0.93) with minimal overfitting. Evaluations across heterogeneous domains demonstrate ARIA's strong performance, interpretability, and efficiency compared with state-of-the-art systems. By combining AI for research and AI for science principles within a spec-driven architecture, ARIA establishes a new paradigm for transparent, collaborative, and reproducible scientific discovery.
Authors:Shalaleh Rismani, Renee Shelby, Leah Davis, Negar Rostamzadeh, AJung Moon
Abstract:
Over the past decade, an ecosystem of measures has emerged to evaluate the social and ethical implications of AI systems, largely shaped by high-level ethics principles. These measures are developed and used in fragmented ways, without adequate attention to how they are situated in AI systems. In this paper, we examine how existing measures used in the computing literature map to AI system components, attributes, hazards, and harms. Our analysis draws on a scoping review resulting in nearly 800 measures corresponding to 11 AI ethics principles. We find that most measures focus on four principles - fairness, transparency, privacy, and trust - and primarily assess model or output system components. Few measures account for interactions across system elements, and only a narrow set of hazards is typically considered for each harm type. Many measures are disconnected from where harm is experienced and lack guidance for setting meaningful thresholds. These patterns reveal how current evaluation practices remain fragmented, measuring in pieces rather than capturing how harms emerge across systems. Framing measures with respect to system attributes, hazards, and harms can strengthen regulatory oversight, support actionable practices in industry, and ground future research in systems-level understanding.
Authors:Leijie Wang, Kathryn Yurechko, Amy X. Zhang
Abstract:
While LLMs now enable users to create content classifiers easily through natural language, automatic prompt optimization techniques are often necessary to create performant classifiers. However, such techniques can fail to consider how social media users want to evolve their filters over the course of usage, including desiring to steer them in different ways during initialization and iteration. We introduce a user-centered prompt optimization technique, Promptimizer, that maintains high performance and ease-of-use but additionally (1) allows for user input into the optimization process and (2) produces final prompts that are interpretable. A lab experiment (n=16) found that users significantly preferred Promptimizer's human-in-the-loop optimization over a fully automatic approach. We further implement Promptimizer into Puffin, a tool to support YouTube content creators in creating and maintaining personal classifiers to manage their comments. Over a 3-week deployment with 10 creators, participants successfully created diverse filters to better understand their audiences and protect their communities.
Authors:Adib Bazgir, Amir Habibdoust, Xing Song, Yuwen Zhang
Abstract:
Alzheimer's disease (AD) presents a complex, multifaceted challenge to patients, caregivers, and the healthcare system, necessitating integrated and dynamic support solutions. While artificial intelligence (AI) offers promising avenues for intervention, current applications are often siloed, addressing singular aspects of the disease such as diagnostics or caregiver support without systemic integration. This paper proposes a novel methodological framework for a comprehensive, multi-agent system (MAS) designed for holistic Alzheimer's disease management. The objective is to detail the architecture of a collaborative ecosystem of specialized AI agents, each engineered to address a distinct challenge in the AD care continuum, from caregiver support and multimodal data analysis to automated research and clinical data interpretation. The proposed framework is composed of eight specialized, interoperable agents. These agents are categorized by function: (1) Caregiver and Patient Support, (2) Data Analysis and Research, and (3) Advanced Multimodal Workflows. The methodology details the technical architecture of each agent, leveraging a suite of advanced technologies including large language models (LLMs) such as GPT-4o and Gemini, multi-agent orchestration frameworks, Retrieval-Augmented Generation (RAG) for evidence-grounded responses, and specialized tools for web scraping, multimodal data processing, and in-memory database querying. This paper presents a detailed architectural blueprint for an integrated AI ecosystem for AD care. By moving beyond single-purpose tools to a collaborative, multi-agent paradigm, this framework establishes a foundation for developing more adaptive, personalized, and proactive solutions. This methodological approach aims to pave the way for future systems capable of synthesizing diverse data streams to improve patient outcomes and reduce caregiver burden.
Authors:Justus Flerlage, Alexander Acker, Odej Kao
Abstract:
Large Language Models (LLMs) have emerged as transformative tools for natural language understanding and user intent resolution, enabling tasks such as translation, summarization, and, increasingly, the orchestration of complex workflows. This development signifies a paradigm shift from conventional, GUI-driven user interfaces toward intuitive, language-first interaction paradigms. Rather than manually navigating applications, users can articulate their objectives in natural language, enabling LLMs to orchestrate actions across multiple applications in a dynamic and contextual manner. However, extant implementations frequently rely on cloud-based proprietary models, which introduce limitations in terms of privacy, autonomy, and scalability. For language-first interaction to become a truly robust and trusted interface paradigm, local deployment is not merely a convenience; it is an imperative. This limitation underscores the importance of evaluating the feasibility of locally deployable, open-source, and open-access LLMs as foundational components for future intent-based operating systems. In this study, we examine the capabilities of several open-source and open-access models in facilitating user intention resolution through machine assistance. A comparative analysis is conducted against OpenAI's proprietary GPT-4-based systems to assess performance in generating workflows for various user intentions. The present study offers empirical insights into the practical viability, performance trade-offs, and potential of open LLMs as autonomous, locally operable components in next-generation operating systems. The results of this study inform the broader discussion on the decentralization and democratization of AI infrastructure and point toward a future where user-device interaction becomes more seamless, adaptive, and privacy-conscious through locally embedded intelligence.
Authors:Mengdi Chu, Zefeng Qiu, Meng Ling, Shuning Jiang, Robert S. Laramee, Michael Sedlmair, Jian Chen
Abstract:
We investigate the perceived visual complexity (VC) in data visualizations using objective image-based metrics. We collected VC scores through a large-scale crowdsourcing experiment involving 349 participants and 1,800 visualization images. We then examined how these scores align with 12 image-based metrics spanning information-theoretic, clutter, color, and our two object-based metrics. Our results show that both low-level image properties and the high-level elements affect perceived VC in visualization images; The number of corners and distinct colors are robust metrics across visualizations. Second, feature congestion, an information-theoretic metric capturing statistical patterns in color and texture, is the strongest predictor of perceived complexity in visualizations rich in the same stimuli; edge density effectively explains VC in node-link diagrams. Additionally, we observe a bell-curve effect for text annotations: increasing text-to-ink ratio (TiR) initially reduces complexity, reaching an optimal point, beyond which further text increases perceived complexity. Our quantification pipeline is also interpretable, enabling metric-based explanations, grounded in the VisComplexity2K dataset, bridging computational metrics with human perceptual responses. osf.io/5xe8a has the preregistration and osf.io/bdet6 has the VisComplexity2K dataset, source code, and all Apdx. and figures.
Authors:Lirui Guo, Michael G. Burke, Wynita M. Griggs
Abstract:
Shared Autonomous Vehicles (SAVs) are likely to become an important part of the transportation system, making effective human-SAV interactions an important area of research. This paper introduces a dataset of 200 human-SAV interactions to further this area of study. We present an open-source human-SAV conversational dataset, comprising both textual data (e.g., 2,136 human-SAV exchanges) and empirical data (e.g., post-interaction survey results on a range of psychological factors). The dataset's utility is demonstrated through two benchmark case studies: First, using random forest modeling and chord diagrams, we identify key predictors of SAV acceptance and perceived service quality, highlighting the critical influence of response sentiment polarity (i.e., perceived positivity). Second, we benchmark the performance of an LLM-based sentiment analysis tool against the traditional lexicon-based TextBlob method. Results indicate that even simple zero-shot LLM prompts more closely align with user-reported sentiment, though limitations remain. This study provides novel insights for designing conversational SAV interfaces and establishes a foundation for further exploration into advanced sentiment modeling, adaptive user interactions, and multimodal conversational systems.
Authors:Shramay Palta, Peter Rankel, Sarah Wiegreffe, Rachel Rudinger
Abstract:
We investigate the degree to which human plausibility judgments of multiple-choice commonsense benchmark answers are subject to influence by (im)plausibility arguments for or against an answer, in particular, using rationales generated by LLMs. We collect 3,000 plausibility judgments from humans and another 13,600 judgments from LLMs. Overall, we observe increases and decreases in mean human plausibility ratings in the presence of LLM-generated PRO and CON rationales, respectively, suggesting that, on the whole, human judges find these rationales convincing. Experiments with LLMs reveal similar patterns of influence. Our findings demonstrate a novel use of LLMs for studying aspects of human cognition, while also raising practical concerns that, even in domains where humans are ``experts'' (i.e., common sense), LLMs have the potential to exert considerable influence on people's beliefs.
Authors:Frederic Gmeiner, Kenneth Holstein, Nikolas Martelaro
Abstract:
Recent advancements in multimodal generative AI (GenAI) enable the creation of personal context-aware real-time agents that, for example, can augment user workflows by following their on-screen activities and providing contextual assistance. However, prototyping such experiences is challenging, especially when supporting people with domain-specific tasks using real-time inputs such as speech and screen recordings. While prototyping an LLM-based proactive support agent system, we found that existing prototyping and evaluation methods were insufficient to anticipate the nuanced situational complexity and contextual immediacy required. To overcome these challenges, we explored a novel user-centered prototyping approach that combines counterfactual video replay prompting and hybrid Wizard-of-Oz methods to iteratively design and refine agent behaviors. This paper discusses our prototyping experiences, highlighting successes and limitations, and offers a practical guide and an open-source toolkit for UX designers, HCI researchers, and AI toolmakers to build more user-centered and context-aware multimodal agents.
Authors:Niharika Mathur, Tamara Zubatiy, Agata Rozga, Jodi Forlizzi, Elizabeth Mynatt
Abstract:
Designing Conversational AI systems to support older adults requires these systems to explain their behavior in ways that align with older adults' preferences and context. While prior work has emphasized the importance of AI explainability in building user trust, relatively little is known about older adults' requirements and perceptions of AI-generated explanations. To address this gap, we conducted an exploratory Speed Dating study with 23 older adults to understand their responses to contextually grounded AI explanations. Our findings reveal the highly context-dependent nature of explanations, shaped by conversational cues such as the content, tone, and framing of explanation. We also found that explanations are often interpreted as interactive, multi-turn conversational exchanges with the AI, and can be helpful in calibrating urgency, guiding actionability, and providing insights into older adults' daily lives for their family members. We conclude by discussing implications for designing context-sensitive and personalized explanations in Conversational AI systems.
Authors:Niharika Mathur, Tamara Zubatiy, Agata Rozga, Elizabeth Mynatt
Abstract:
Designing Conversational AI systems to support older adults requires more than usability and reliability, it also necessitates robustness in handling conversational breakdowns. In this study, we investigate how older adults navigate and repair such breakdowns while interacting with a voice-based AI system deployed in their homes for medication management. Through a 20-week in-home deployment with 7 older adult participant dyads, we analyzed 844 recoded interactions to identify conversational breakdowns and user-initiated repair strategies. Through findings gleaned from post-deployment interviews, we reflect on the nature of these breakdowns and older adults' experiences of mitigating them. We identify four types of conversational breakdowns and demonstrate how older adults draw on their situated knowledge and environment to make sense of and recover from these disruptions, highlighting the cognitive effort required in doing so. Our findings emphasize the collaborative nature of interactions in human-AI contexts, and point to the need for AI systems to better align with users' expectations for memory, their routines, and external resources in their environment. We conclude by discussing opportunities for AI systems to integrate contextual knowledge from older adults' sociotechnical environment and to facilitate more meaningful and user-centered interactions.
Authors:Batu El, James Zou
Abstract:
Large language models (LLMs) are increasingly shaping how information is created and disseminated, from companies using them to craft persuasive advertisements, to election campaigns optimizing messaging to gain votes, to social media influencers boosting engagement. These settings are inherently competitive, with sellers, candidates, and influencers vying for audience approval, yet it remains poorly understood how competitive feedback loops influence LLM behavior. We show that optimizing LLMs for competitive success can inadvertently drive misalignment. Using simulated environments across these scenarios, we find that, 6.3% increase in sales is accompanied by a 14.0% rise in deceptive marketing; in elections, a 4.9% gain in vote share coincides with 22.3% more disinformation and 12.5% more populist rhetoric; and on social media, a 7.5% engagement boost comes with 188.6% more disinformation and a 16.3% increase in promotion of harmful behaviors. We call this phenomenon Moloch's Bargain for AI--competitive success achieved at the cost of alignment. These misaligned behaviors emerge even when models are explicitly instructed to remain truthful and grounded, revealing the fragility of current alignment safeguards. Our findings highlight how market-driven optimization pressures can systematically erode alignment, creating a race to the bottom, and suggest that safe deployment of AI systems will require stronger governance and carefully designed incentives to prevent competitive dynamics from undermining societal trust.
Authors:B. Sankar, Devottama Sen, Dibakar Sen
Abstract:
Humans navigate and understand complex visual environments by subconsciously quantifying what they see, a process known as visual enumeration. However, traditional studies using flat screens fail to capture the cognitive dynamics of this process over the large visual fields of real-world scenes. To address this gap, we developed an immersive virtual reality system with integrated eye-tracking to investigate the interplay between attention and memory during complex enumeration. We conducted a two-phase experiment where participants enumerated scenes of either simple abstract shapes or complex real-world objects, systematically varying the task intent (e.g., selective vs. exhaustive counting) and the spatial layout of items. Our results reveal that task intent is the dominant factor driving performance, with selective counting imposing a significant cognitive cost that was dramatically amplified by stimulus complexity. The semantic processing required for real-world objects reduced accuracy and suppressed memory recall, while the influence of spatial layout was secondary and statistically non-significant when a higher-order cognitive task intent was driving the human behaviour. We conclude that real-world enumeration is fundamentally constrained by the cognitive load of semantic processing, not just the mechanics of visual search. Our findings demonstrate that under high cognitive demand, the effort to understand what we are seeing directly limits our capacity to remember it.
Authors:Yuhan Guo, Xingyou Liu, Xiaoru Yuan, Kai Xu
Abstract:
Text-to-image generative models can be tremendously valuable in supporting creative tasks by providing inspirations and enabling quick exploration of different design ideas. However, one common challenge is that users may still not be able to find anything useful after many hours and hundreds of images. Without effective help, users can easily get lost in the vast design space, forgetting what has been tried and what has not. In this work, we first propose the Design-Exploration model to formalize the exploration process. Based on this model, we create an interactive visualization system, PromptMap, to support exploratory text-to-image generation. Our system provides a new visual representation that better matches the non-linear nature of such processes, making them easier to understand and follow. It utilizes novel visual representations and intuitive interactions to help users structure the many possibilities that they can explore. We evaluated the system through in-depth interviews with users.
Authors:Hita Kambhamettu, Alyssa Hwang, Philippe Laban, Andrew Head
Abstract:
AI question answering systems increasingly generate responses with attributions to sources. However, the task of verifying the actual content of these attributions is in most cases impractical. In this paper, we present attribution gradients as a solution. Attribution gradients provide integrated, incremental affordances for diving into an attributed passage. A user can decompose a sentence of an answer into its claims. For each claim, the user can view supporting and contradictory excerpts mined from sources. Those excerpts serve as clickable conduits into the source (in our application, scientific papers). When evidence itself contains more citations, the UI unpacks the evidence into excerpts from the cited sources. These features of attribution gradients facilitate concurrent interconnections among answer, claim, excerpt, and context. In a usability study, we observed greater engagement with sources and richer revision in a task where participants revised an attributed AI answer with attribution gradients and a baseline.
Authors:Naaz Sibia, Valeria Ramirez Osorio, Jessica Wen, Rutwa Engineer, Angela Zavaleta Bernuy, Andrew Petersen, Michael Liut, Carolina Nobre
Abstract:
Novice programmers often struggle to understand how code executes and to form the abstract mental models necessary for effective problem-solving, challenges that are amplified in large, diverse introductory courses where students' backgrounds, language proficiencies, and prior experiences vary widely. This study examines whether interactive, multi-representational visualizations, combining synchronized code views, memory diagrams, and conceptual analogies, can help manage cognitive load and foster engagement more effectively than single-visual or text-only approaches. Over a 12-week deployment in a high-enrolment introductory Python course (N = 829), students who relied solely on text-based explanations reported significantly higher immediate mental effort than those using visual aids, although overall cognitive load did not differ significantly among conditions. The multi-representational approach consistently yielded higher engagement than both single-visual and text-only methods. Usage logs indicated that learners' interaction patterns varied with topic complexity, and predictive modelling suggested that early experiences of high cognitive load were associated with lower longer-term perceptions of clarity and helpfulness. Individual differences, including language proficiency and prior programming experience, moderated these patterns. By integrating multiple external representations with scaffolded support adapted to diverse learner profiles, our findings highlight design considerations for creating visualization tools that more effectively support novices learning to program.
Authors:Yuhao Kang, Junda Chen, Liu Liu, Kshitij Sharmad, Martina Mazzarello, Simone Mora, Fabio Duarte, Carlo Ratti
Abstract:
The way residents perceive safety plays an important role in how they use public spaces. Studies have combined large-scale street view images and advanced computer vision techniques to measure the perception of safety of urban environments. Despite their success, such studies have often overlooked the specific environmental visual factors that draw human attention and trigger people's feelings of safety perceptions. In this study, we introduce a computational framework that enriches the existing body of literature on place perception by using eye-tracking systems with street view images and deep learning approaches. Eye-tracking systems quantify not only what users are looking at but also how long they engage with specific environmental elements. This allows us to explore the nuance of which visual environmental factors influence human safety perceptions. We conducted our research in Helsingborg, Sweden, where we recruited volunteers outfitted with eye-tracking systems. They were asked to indicate which of the two street view images appeared safer. By examining participants' focus on specific features using Mean Object Ratio in Highlighted Regions (MoRH) and Mean Object Hue (MoH), we identified key visual elements that attract human attention when perceiving safe environments. For instance, certain urban infrastructure and public space features draw more human attention while the sky is less relevant in influencing safety perceptions. These insights offer a more human-centered understanding of which urban features influence human safety perceptions. Furthermore, we compared the real human attention from eye-tracking systems with attention maps obtained from eXplainable Artificial Intelligence (XAI) results. Several XAI models were tested, and we observed that XGradCAM and EigenCAM most closely align with human safety perceptual patterns.
Authors:Chelse Swoopes, Ziwei Gu, Elena L. Glassman
Abstract:
Legal professionals spend significant time reading, writing, and interpreting complex documents, yet research has not fully captured how they approach these tasks or what they expect from skimming and writing-support tools. To examine practices and views on emerging tools, we interviewed 22 legal professionals about workflows, challenges, and technology use. In each session, we leveraged prior HCI-based skimming and writing prototypes that surface emergent cross-document relationships and support AI-resilient interaction (noticing, judging, and recovering from model errors or unexpected behavior); participants completed a contextual fit evaluation to assess whether and how they would use the tools, which document types, and at what stages in their work. Our analysis details limitations and challenges in workflows, domain-specific feedback on AI-resilient interfaces, and expert insights on legal tech design. These findings offer actionable guidance for technology designers developing reading and writing-support for legal professionals, and for legal professionals seeking peer-informed tool integration strategies.
Authors:Nami Ogawa, Yuki Okafuji, Yuji Hatada, Jun Baba
Abstract:
The rapid development of artificial intelligence (AI) has fundamentally transformed creative work practices in the design industry. Existing studies have identified both opportunities and challenges for creative practitioners in their collaboration with generative AI and explored ways to facilitate effective human-AI co-creation. However, there is still a limited understanding of designers' collaboration with AI that supports creative processes distinct from generative AI. To address these gaps, this study focuses on understanding designers' collaboration with decision-making AI, which supports the convergence process in the creative workflow, as opposed to the divergent process supported by generative AI. Specifically, we conducted a case study at an online advertising design company to explore how professional graphic designers at the company perceive the impact of decision-making AI on their creative work practices. The case company incorporated an AI system that predicts the effectiveness of advertising design into the design workflow as a decision-making support tool. Findings from interviews with 12 designers identified how designers trust and rely on AI, its perceived benefits and challenges, and their strategies for navigating the challenges. Based on the findings, we discuss design recommendations for integrating decision-making AI into the creative design workflow.
Authors:Huatao Xu, Yan Zhang, Wei Gao, Guobin Shen, Mo Li
Abstract:
This paper presents the first nationwide deployment of human activity recognition (HAR) technology in the on-demand food delivery industry. We successfully adapted the state-of-the-art LIMU-BERT foundation model to the delivery platform. Spanning three phases over two years, the deployment progresses from a feasibility study in Yangzhou City to nationwide adoption involving 500,000 couriers across 367 cities in China. The adoption enables a series of downstream applications, and large-scale tests demonstrate its significant operational and economic benefits, showcasing the transformative potential of HAR technology in real-world applications. Additionally, we share lessons learned from this deployment and open-source our LIMU-BERT pretrained with millions of hours of sensor data.
Authors:Faraz Faruqi, Josha Paonaskar, Riley Schuler, Aiden Prevey, Carson Taylor, Anika Tak, Anthony Guinto, Eeshani Shilamkar, Natarith Cheenaruenthong, Martin Nisser
Abstract:
This paper introduces WireBend-kit, a desktop wirebending machine and computational design tool for creating 3D wireframe structures. Combined, they allow users to rapidly and inexpensively create custom 3D wireframe structures from aluminum wire. Our design tool is implemented in freely available software and allows users to generate virtual wireframe designs and assess their fabricability. A path-planning procedure automatically converts the wireframe design into fabrication instructions for our machine while accounting for material elasticity and kinematic error sources. The custom machine costs $293 in parts and can form aluminum wire into 3D wireframe structures through an ordered sequence of feed, bend, and rotate instructions. Our technical evaluation reveals our system's ability to overcome odometrically accumulating errors inherent to wirebending in order to produce accurate 3D structures from inexpensive hardware. Finally, we provide application examples demonstrating the design space enabled by Wirebend-kit.
Authors:Hao-Ping Lee, Yu-Ju Yang, Matthew Bilik, Isadora Krsek, Thomas Serban von Davier, Kyzyl Monteiro, Jason Lin, Shivani Agarwal, Jodi Forlizzi, Sauvik Das
Abstract:
AI creates and exacerbates privacy risks, yet practitioners lack effective resources to identify and mitigate these risks. We present Privy, a tool that guides practitioners through structured privacy impact assessments to: (i) identify relevant risks in novel AI product concepts, and (ii) propose appropriate mitigations. Privy was shaped by a formative study with 11 practitioners, which informed two versions -- one LLM-powered, the other template-based. We evaluated these two versions of Privy through a between-subjects, controlled study with 24 separate practitioners, whose assessments were reviewed by 13 independent privacy experts. Results show that Privy helps practitioners produce privacy assessments that experts deemed high quality: practitioners identified relevant risks and proposed appropriate mitigation strategies. These effects were augmented in the LLM-powered version. Practitioners themselves rated Privy as being useful and usable, and their feedback illustrates how it helps overcome long-standing awareness, motivation, and ability barriers in privacy work.
Authors:Rukhshan Haroon, Kyle Wigdor, Katie Yang, Nicole Toumanios, Eileen T. Crehan, Fahad Dogar
Abstract:
Communication challenges between autistic and neurotypical individuals stem from a mutual lack of understanding of each other's distinct, and often contrasting, communication styles. Yet, autistic individuals are expected to adapt to neurotypical norms, making interactions inauthentic and mentally exhausting for them. To help redress this imbalance, we build NeuroBridge, an online platform that utilizes large language models (LLMs) to simulate: (a) an AI character that is direct and literal, a style common among many autistic individuals, and (b) four cross-neurotype communication scenarios in a feedback-driven conversation between this character and a neurotypical user. Through NeuroBridge, neurotypical individuals gain a firsthand look at autistic communication, and reflect on their role in shaping cross-neurotype interactions. In a user study with 12 neurotypical participants, we find that NeuroBridge improved their understanding of how autistic people may interpret language differently, with all describing autism as a social difference that "needs understanding by others" after completing the simulation. Participants valued its personalized, interactive format and described AI-generated feedback as "constructive", "logical" and "non-judgmental". Most perceived the portrayal of autism in the simulation as accurate, suggesting that users may readily accept AI-generated (mis)representations of disabilities. To conclude, we discuss design implications for disability representation in AI, the need for making NeuroBridge more personalized, and LLMs' limitations in modeling complex social scenarios.
Authors:Tourjana Islam Supti, Israa Abuelezz, Aya Muhanad, Mahmoud Barhmagi, Ala Yankouskaya, Khaled M. Khan, Aiman Erbad, Raian Ali
Abstract:
This study investigates the efficacy of role-taking and literacy-based interventions in reducing the influence of appearance cues, such as gender, age, ethnicity, and clothing style, on trust and risk-taking in social engineering contexts. A-4 (Group: Control, Literacy, Persuader, Persuadee) * 2 (Time: Pre, Post) mixed factorial design was implemented over two weeks with 139 participants. The control group received no material. The literacy group attended two sessions focused on how behavior can be similar regardless of appearance cues. The persuader group completed three sessions, learning how to use such cues to influence others. The persuadee group attended three sessions involving the selection, justification, and reflection on personas and scenarios. Scenarios centered on financial and rental advice. A one-week gap followed before post-intervention testing. In both pre- and post-tests, participants assessed personas combining appearance cues, offering mobile hotspots with potential risk. They rated trust and willingness to take the risk. Validated measures and scenarios were used, including word-of-mouth and issue involvement scales. It was expected that cue influence would diminish post-intervention. However, no significant within- or between-group differences emerged. Findings raise concerns about the effectiveness of debiasing efforts and call for reconsideration of approaches using literacy, role-taking, rehearsal, drama, and simulation.
Authors:Veda Duddu, Jash Rajesh Parekh, Andy Mao, Hanyi Min, Ziang Xiao, Vedant Das Swain, Koustuv Saha
Abstract:
Workplace negotiations are undermined by psychological barriers, which can even derail well-prepared tactics. AI offers personalized and always -- available negotiation coaching, yet its effectiveness for negotiation preparedness remains unclear. We built Trucey, a prototype AI coach grounded in Brett's negotiation model. We conducted a between-subjects experiment (N=267), comparing Trucey, ChatGPT, and a traditional negotiation Handbook, followed by in-depth interviews (N=15). While Trucey showed the strongest reductions in fear relative to both comparison conditions, the Handbook outperformed both AIs in usability and psychological empowerment. Interviews revealed that the Handbook's comprehensive, reviewable content was crucial for participants' confidence and preparedness. In contrast, although participants valued AI's rehearsal capability, its guidance often felt verbose and fragmented -- delivered in bits and pieces that required additional effort -- leaving them uncertain or overwhelmed. These findings challenge assumptions of AI superiority and motivate hybrid designs that integrate structured, theory-driven content with targeted rehearsal, clear boundaries, and adaptive scaffolds to address psychological barriers and support negotiation preparedness.
Authors:Xun Li, Rodrigo Santa Cruz, Mingze Xi, Hu Zhang, Madhawa Perera, Ziwei Wang, Ahalya Ravendran, Brandon J. Matthews, Feng Xu, Matt Adcock, Dadong Wang, Jiajun Liu
Abstract:
To enable robots to comprehend high-level human instructions and perform complex tasks, a key challenge lies in achieving comprehensive scene understanding: interpreting and interacting with the 3D environment in a meaningful way. This requires a smart map that fuses accurate geometric structure with rich, human-understandable semantics. To address this, we introduce the 3D Queryable Scene Representation (3D QSR), a novel framework built on multimedia data that unifies three complementary 3D representations: (1) 3D-consistent novel view rendering and segmentation from panoptic reconstruction, (2) precise geometry from 3D point clouds, and (3) structured, scalable organization via 3D scene graphs. Built on an object-centric design, the framework integrates with large vision-language models to enable semantic queryability by linking multimodal object embeddings, and supporting object-level retrieval of geometric, visual, and semantic information. The retrieved data are then loaded into a robotic task planner for downstream execution. We evaluate our approach through simulated robotic task planning scenarios in Unity, guided by abstract language instructions and using the indoor public dataset Replica. Furthermore, we apply it in a digital duplicate of a real wet lab environment to test QSR-supported robotic task planning for emergency response. The results demonstrate the framework's ability to facilitate scene understanding and integrate spatial and semantic reasoning, effectively translating high-level human instructions into precise robotic task planning in complex 3D environments.
Authors:Frederick Choi, Eshwar Chandrasekharan
Abstract:
Personalized recommendation algorithms deliver content to the user on most major social media platforms. While these algorithms are crucial for helping users find relevant content, users lack meaningful control over them. This reduces users' sense of agency and their ability to adapt social media feeds to their own needs and values. Efforts have been made to give users more control over their feeds, but usability remains a major barrier to adoption. Drawing upon prior work in designing teachable social media feeds, we built Pilot, a novel system of controls and feedback mechanisms on BlueSky that are expressive, intuitive, and integrated directly into the feed to allow users to customize their feed while they browse. Our user study suggests the system increases the user's sense of agency, and encourages them to think more critically about curating their feeds. We synthesize design implications for enhancing user agency over social media feeds.
Authors:Shayan Monadjemi, Yuhan Guo, Kai Xu, Alex Endert, Anamaria Crisan
Abstract:
Artificial agents are increasingly integrated into data analysis workflows, carrying out tasks that were primarily done by humans. Our research explores how the introduction of automation re-calibrates the dynamic between humans and automating technology. To explore this question, we conducted a scoping review encompassing twenty years of mixed-initiative visual analytic systems. To describe and contrast the relationship between humans and automation, we developed an integrated taxonomy to delineate the objectives of these mixed-initiative visual analytics tools, how much automation they support, and the assumed roles of humans. Here, we describe our qualitative approach of integrating existing theoretical frameworks with new codes we developed. Our analysis shows that the visualization research literature lacks consensus on the definition of mixed-initiative systems and explores a limited potential of the collaborative interaction landscape between people and automation. Our research provides a scaffold to advance the discussion of human-AI collaboration during visual data analysis.
Authors:Yao Song, Christoph Gebhardt, Yi-Chi Liao, Christian Holz
Abstract:
3D Mixed Reality interfaces have nearly unlimited space for layout placement, making automatic UI adaptation crucial for enhancing the user experience. Such adaptation is often formulated as a multi-objective optimization (MOO) problem, where multiple, potentially conflicting design objectives must be balanced. However, selecting a final layout is challenging since MOO typically yields a set of trade-offs along a Pareto frontier. Prior approaches often required users to manually explore and evaluate these trade-offs, a time-consuming process that disrupts the fluidity of interaction. To eliminate this manual and laborous step, we propose a novel optimization approach that efficiently determines user preferences from a minimal number of UI element adjustments. These determined rankings are translated into priority levels, which then drive our priority-based MOO algorithm. By focusing the search on user-preferred solutions, our method not only identifies UIs that are more aligned with user preferences, but also automatically selects the final design from the Pareto frontier; ultimately, it minimizes user effort while ensuring personalized layouts. Our user study in a Mixed Reality setting demonstrates that our preference-guided approach significantly reduces manual adjustments compared to traditional methods, including fully manual design and exhaustive Pareto front searches, while maintaining high user satisfaction. We believe this work opens the door for more efficient MOO by seamlessly incorporating user preferences.
Authors:Ziyou Li, Agnia Sergeyuk, Maliheh Izadi
Abstract:
Large Language Models are transforming software engineering, yet prompt management in practice remains ad hoc, hindering reliability, reuse, and integration into industrial workflows. We present Prompt-with-Me, a practical solution for structured prompt management embedded directly in the development environment. The system automatically classifies prompts using a four-dimensional taxonomy encompassing intent, author role, software development lifecycle stage, and prompt type. To enhance prompt reuse and quality, Prompt-with-Me suggests language refinements, masks sensitive information, and extracts reusable templates from a developer's prompt library. Our taxonomy study of 1108 real-world prompts demonstrates that modern LLMs can accurately classify software engineering prompts. Furthermore, our user study with 11 participants shows strong developer acceptance, with high usability (Mean SUS=73), low cognitive load (Mean NASA-TLX=21), and reported gains in prompt quality and efficiency through reduced repetitive effort. Lastly, we offer actionable insights for building the next generation of prompt management and maintenance tools for software engineering workflows.
Authors:Deuksin Kwon, Kaleen Shrestha, Bin Han, Elena Hayoung Lee, Gale Lucas
Abstract:
Large Language Models (LLMs) are increasingly deployed in socially complex, interaction-driven tasks, yet their ability to mirror human behavior in emotionally and strategically complex contexts remains underexplored. This study assesses the behavioral alignment of personality-prompted LLMs in adversarial dispute resolution by simulating multi-turn conflict dialogues that incorporate negotiation. Each LLM is guided by a matched Five-Factor personality profile to control for individual variation and enhance realism. We evaluate alignment across three dimensions: linguistic style, emotional expression (e.g., anger dynamics), and strategic behavior. GPT-4.1 achieves the closest alignment with humans in linguistic style and emotional dynamics, while Claude-3.7-Sonnet best reflects strategic behavior. Nonetheless, substantial alignment gaps persist. Our findings establish a benchmark for alignment between LLMs and humans in socially complex interactions, underscoring both the promise and the limitations of personality conditioning in dialogue modeling.
Authors:Deepak Varuvel Dennison, Mohit Jain, Tanuja Ganu, Aditya Vashistha
Abstract:
AI technologies are increasingly deployed in high-stakes domains such as education, healthcare, law, and agriculture to address complex challenges in non-Western contexts. This paper examines eight real-world deployments spanning seven countries and 18 languages, combining 17 interviews with AI developers and domain experts with secondary research. Our findings identify six cross-cutting factors - Language, Domain, Demography, Institution, Task, and Safety - that structured how systems were designed and deployed. These factors were shaped by sociocultural (diversity, practices), institutional (resources, policies), and technological (capabilities, limits) influences. We find that building AI systems required extensive collaboration between AI developers and domain experts. Notably, human resources proved more critical to achieving safe and effective systems in high-stakes domains than technological expertise alone. We present an analytical framework that synthesizes these dynamics and conclude with recommendations for designing AI for social good systems that are culturally grounded, equitable, and responsive to the needs of non-Western contexts.
Authors:Arpita Wadhwa, Aditya Vashistha, Mohit Jain
Abstract:
AI-driven chatbots are increasingly used to support community health workers (CHWs) in developing regions, yet little is known about how cultural framings in chatbot design shape trust in collectivist contexts where decisions are rarely made in isolation. This paper examines how CHWs in rural India responded to chatbots that delivered identical health content but varied in one specific cultural lever -- social norms. Through a mixed-methods study with 61 ASHAs who compared four normative framings -- neutral, descriptive, narrative identity, and injunctive authority -- we (1) analyze how framings influence preferences and trust, and (2) compare effects across low- and high-ambiguity scenarios. Results show that narrative framings were most preferred but encouraged uncritical overreliance, while authority framings were least preferred yet supported calibrated trust. We conclude with design recommendations for dynamic framing strategies that adapt to context and argue for calibrated trust -- following correct advice and resisting incorrect advice -- as a critical evaluation metric for safe, culturally-grounded AI.
Authors:Surej Mouli, Ramaswamy Palaniappan, Emmanuel Molefi, Ian McLoughlin
Abstract:
Steady State Visual Evoked Potential (SSVEP) methods for brain computer interfaces (BCI) are popular due to higher information transfer rate and easier setup with minimal training, compared to alternative methods. With precisely generated visual stimulus frequency, it is possible to translate brain signals into external actions or signals. Traditionally, SSVEP data is collected from the occipital region using electrodes with or without gel, normally mounted on a head cap. In this experimental study, we develop an in ear electrode to collect SSVEP data for four different flicker frequencies and compare against occipital scalp electrode data. Data from five participants demonstrates the feasibility of in-ear electrode based SSVEP, significantly enhancing the practicability of wearable BCI applications.
Authors:Gabriela C. Zapata, Bill Cope, Mary Kalantzis, Duane Searsmith
Abstract:
This study investigates the use of generative AI to support formative assessment through machine generated reviews of peer reviews in graduate online courses in a public university in the United States. Drawing on Systemic Functional Linguistics and Appraisal Theory, we analyzed 120 metareviews to explore how generative AI feedback constructs meaning across ideational, interpersonal, and textual dimensions. The findings suggest that generative AI can approximate key rhetorical and relational features of effective human feedback, offering directive clarity while also maintaining a supportive stance. The reviews analyzed demonstrated a balance of praise and constructive critique, alignment with rubric expectations, and structured staging that foregrounded student agency. By modeling these qualities, AI metafeedback has the potential to scaffold feedback literacy and enhance leaner engagement with peer review.
Authors:Kexue Fu, Jiaye Leng, Yawen Zhang, Jingfei Huang, Yihang Zuo, Runze Cai, Zijian Ding, Ray LC, Shengdong Zhao, Qinyuan Lei
Abstract:
Balancing scientific exposition and narrative engagement is a central challenge in science communication. To examine how to achieve balance, we conducted a formative study with four science communicators and a literature review of science communication practices, focusing on their workflows and strategies. These insights revealed how creators iteratively shift between exposition and engagement but often lack structured support. Building on this, we developed SpatialBalancing, a co-writing system that connects human spatial reasoning with the linguistic intelligence of large language models. The system visualizes revision trade-offs in a dual-axis space, where users select strategy-based labels to generate, compare, and refine versions during the revision process. This spatial externalization transforms revision into spatial navigation, enabling intentional iterations that balance scientific rigor with narrative appeal. In a within-subjects study (N=16), SpatialBalancing enhanced metacognitive reflection, flexibility, and creative exploration, demonstrating how coupling spatial reasoning with linguistic generation fosters monitoring in iterative science communication writing.
Authors:Tobias Weinberg, Ricardo E. Gonzalez Penuela, Stephanie Valencia, Thijs Roumen
Abstract:
Generic AI auto-complete for message composition often fails to capture the nuance of personal identity, requiring significant editing. While harmless in low-stakes settings, for users of Augmentative and Alternative Communication (AAC) devices, who rely on such systems for everyday communication, this editing burden is particularly acute. Intuitively, the need for edits would be lower if language models were personalized to the communication of the specific user. While technically feasible, such personalization raises socio-technical questions: what are the implications of logging one's own conversations, and how does personalization affect privacy, authorship, and control? We explore these questions through an autoethnographic study in three phases: (1) seven months of collecting all the lead author's AAC communication data, (2) fine-tuning a model on this dataset, and (3) three months of daily use of personalized AI suggestions. We reflect on these phases through continuous diary entries and interaction logs. Our findings highlight the value of personalization as well as implications on privacy, authorship, and blurring the boundaries of self-expression.
Authors:Kexue Fu, Jingfei Huang, Long Ling, Sumin Hong, Yihang Zuo, Ray LC, Toby Jia-jun Li
Abstract:
Humans think visually-we remember in images, dream in pictures, and use visual metaphors to communicate. Yet, most creative writing tools remain text-centric, limiting how authors plan and translate ideas. We present Vistoria, a system for synchronized text-image co-editing in fictional story writing that treats visuals and text as coequal narrative materials. A formative Wizard-of-Oz co-design study with 10 story writers revealed how sketches, images, and annotations serve as essential instruments for ideation and organization. Drawing on theories of Instrumental Interaction and Structural Mapping, Vistoria introduces multimodal operations-lasso, collage, filters, and perspective shifts that enable seamless narrative exploration across modalities. A controlled study with 12 participants shows that co-editing enhances expressiveness, immersion, and collaboration, enabling writers to explore divergent directions, embrace serendipitous randomness, and trace evolving storylines. While multimodality increased cognitive demand, participants reported stronger senses of authorship and agency. These findings demonstrate how multimodal co-editing expands creative potential by balancing abstraction and concreteness in narrative development.
Authors:Sihun Baek, Zhehan Qu, Maria Gorlatova
Abstract:
Despite the growing use of AR in safety-critical domains, the field lacks a systematic understanding of how different types of distraction affect user behavior in AR environments. To address this gap, we present AR-TMT, an AR adaptation of the Trail Making Test that spatially renders targets for sequential selection on the Magic Leap 2. We implemented distractions in three categories: top-down, bottom-up, and spatial distraction based on Wolfe's Guided Search model, and captured performance, gaze, motor behavior, and subjective load measures to analyze user attention and behavior. A user study with 34 participants revealed that top-down distraction degraded performance through semantic interference, while bottom-up distraction disrupted initial attentional engagement. Spatial distraction destabilized gaze behavior, leading to more scattered and less structured visual scanning patterns. We also found that performance was correlated with attention control ($R^2 = .20$--$.35$) under object-based distraction conditions, where distractors possessed task-relevant features. The study offers insights into distraction mechanisms and their impact on users, providing opportunities for generalization to ecologically relevant AR tasks while underscoring the need to address the unique demands of AR environments.
Authors:Synthia Wang, Sai Teja Peddinti, Nina Taft, Nick Feamster
Abstract:
Large Language Models (LLMs) such as ChatGPT can infer personal attributes from seemingly innocuous text, raising privacy risks beyond memorized data leakage. While prior work has demonstrated these risks, little is known about how users estimate and respond. We conducted a survey with 240 U.S. participants who judged text snippets for inference risks, reported concern levels, and attempted rewrites to block inference. We compared their rewrites with those generated by ChatGPT and Rescriber, a state-of-the-art sanitization tool. Results show that participants struggled to anticipate inference, performing a little better than chance. User rewrites were effective in just 28\% of cases - better than Rescriber but worse than ChatGPT. We examined our participants' rewriting strategies, and observed that while paraphrasing was the most common strategy it is also the least effective; instead abstraction and adding ambiguity were more successful. Our work highlights the importance of inference-aware design in LLM interactions.
Authors:Synthia Wang, Yuwei Cheng, Austin Song, Sarah Keedy, Marc Berman, Nick Feamster
Abstract:
Limited access to mental health care has motivated the use of digital tools and conversational agents powered by large language models (LLMs), yet their quality and reception remain unclear. We present a study comparing therapist-written responses to those generated by ChatGPT, Gemini, and Llama for real patient questions. Text analysis showed that LLMs produced longer, more readable, and lexically richer responses with a more positive tone, while therapist responses were more often written in the first person. In a survey with 150 users and 23 licensed therapists, participants rated LLM responses as clearer, more respectful, and more supportive than therapist-written answers. Yet, both groups of participants expressed a stronger preference for human therapist support. These findings highlight the promise and limitations of LLMs in mental health, underscoring the need for designs that balance their communicative strengths with concerns of trust, privacy, and accountability.
Authors:Riya Gill, Ievgeniia Kuzminykh, Maher Salem, Bogdan Ghita
Abstract:
This study aims to develop an adaptive learning platform that leverages generative AI to automate assessment creation and feedback delivery. The platform provides self-correcting tests and personalised feedback that adapts to each learners progress and history, ensuring a tailored learning experience. The study involves the development and evaluation of a web-based application for revision for the UK Driving Theory Test. The platform generates dynamic, non-repetitive question sets and offers adaptive feedback based on user performance over time. The effectiveness of AI-generated assessments and feedback is evaluated through expert review and model analysis. The results show the successful generation of relevant and accurate questions, alongside positive and helpful feedback. The personalised test generation closely aligns with expert-created assessments, demonstrating the reliability of the system. These findings suggest that generative AI can enhance learning outcomes by adapting to individual student needs and offering tailored support. This research introduces an AI-powered assessment and feedback system that goes beyond traditional solutions by incorporating automation and adaptive learning. The non-memoryless feedback mechanism ensures that student history and performance inform future assessments, making the learning process more effective and individualised. This contrasts with conventional systems that provide static, one-time feedback without considering past progress.
Authors:Noga H. Rotman, Tiago Ferreira, Hila Peleg, Mark Silberstein, Alexandra Silva
Abstract:
Requests for Comments (RFCs) are extensive specification documents for network protocols, but their prose-based format and their considerable length often impede precise operational understanding. We present RFSeek, an interactive tool that automatically extracts visual summaries of protocol logic from RFCs. RFSeek leverages large language models (LLMs) to generate provenance-linked, explorable diagrams, surfacing both official state machines and additional logic found only in the RFC text. Compared to existing RFC visualizations, RFSeek's visual summaries are more transparent and easier to audit against their textual source. We showcase the tool's potential through a series of use cases, including guided knowledge extraction and semantic diffing, applied to protocols such as TCP, QUIC, PPTP, and DCCP. In practice, RFSeek not only reconstructs the RFC diagrams included in some specifications, but, more interestingly, also uncovers important logic such as nodes or edges described in the text but missing from those diagrams. RFSeek further derives new visualization diagrams for complex RFCs, with QUIC as a representative case. Our approach, which we term \emph{Summary Visualization}, highlights a promising direction: combining LLMs with formal, user-customized visualizations to enhance protocol comprehension and support robust implementations.
Authors:Thuy Ngoc Nguyen, Anita Williams Woolley, Cleotilde Gonzalez
Abstract:
Coordinated teamwork is essential in fast-paced decision-making environments that require dynamic adaptation, often without an opportunity for explicit communication. Although implicit coordination has been extensively considered in the existing literature, the majority of work has focused on co-located, synchronous teamwork (such as sports teams) or, in distributed teams, primarily on coordination of knowledge work. However, many teams (firefighters, military, law enforcement, emergency response) must coordinate their movements in physical space without the benefit of visual cues or extensive explicit communication. This paper investigates how three dimensions of spatial coordination, namely exploration diversity, movement specialization, and adaptive spatial proximity, influence team performance in a collaborative online search and rescue task where explicit communication is restricted and team members rely on movement patterns to infer others' intentions and coordinate actions. Our metrics capture the relational aspects of teamwork by measuring spatial proximity, distribution patterns, and alignment of movements within shared environments. We analyze data from 34 four-person teams (136 participants) assigned to specialized roles in a search and rescue task. Results show that spatial specialization positively predicts performance, while adaptive spatial proximity exhibits a marginal inverted U-shaped relationship, suggesting moderate levels of adaptation are optimal. Furthermore, the temporal dynamics of these metrics differentiate high- from low-performing teams over time. These findings provide insights into implicit spatial coordination in role-based teamwork and highlight the importance of balanced adaptive strategies, with implications for training and AI-assisted team support systems.
Authors:Crystal Qian, Kehang Zhu, John Horton, Benjamin S. Manning, Vivian Tsai, James Wexler, Nithum Thain
Abstract:
Coordination tasks traditionally performed by humans are increasingly being delegated to autonomous agents. As this pattern progresses, it becomes critical to evaluate not only these agents' performance but also the processes through which they negotiate in dynamic, multi-agent environments. Furthermore, different agents exhibit distinct advantages: traditional statistical agents, such as Bayesian models, may excel under well-specified conditions, whereas large language models (LLMs) can generalize across contexts. In this work, we compare humans (N = 216), LLMs (GPT-4o, Gemini 1.5 Pro), and Bayesian agents in a dynamic negotiation setting that enables direct, identical-condition comparisons across populations, capturing both outcomes and behavioral dynamics. Bayesian agents extract the highest surplus through aggressive optimization, at the cost of frequent trade rejections. Humans and LLMs can achieve similar overall surplus, but through distinct behaviors: LLMs favor conservative, concessionary trades with few rejections, while humans employ more strategic, risk-taking, and fairness-oriented behaviors. Thus, we find that performance parity -- a common benchmark in agent evaluation -- can conceal fundamental differences in process and alignment, which are critical for practical deployment in real-world coordination tasks.
Authors:Jiahui Li, Sean Papay, Roman Klinger
Abstract:
The output of large language models (LLM) is unstable, due to both non-determinism of the decoding process as well as to prompt brittleness. While the intrinsic non-determinism of LLM generation may mimic existing uncertainty in human annotations through distributional shifts in outputs, it is largely assumed, yet unexplored, that the prompt brittleness effect is unique to LLMs. This raises the question: do human annotators show similar sensitivity to instruction changes? If so, should prompt brittleness in LLMs be considered problematic? One may alternatively hypothesize that prompt brittleness correctly reflects human annotation variances. To fill this research gap, we systematically compare the effects of prompt modifications on LLMs and identical instruction modifications for human annotators, focusing on the question of whether humans are similarly sensitive to prompt perturbations. To study this, we prompt both humans and LLMs for a set of text classification tasks conditioned on prompt variations. Our findings indicate that both humans and LLMs exhibit increased brittleness in response to specific types of prompt modifications, particularly those involving the substitution of alternative label sets or label formats. However, the distribution of human judgments is less affected by typographical errors and reversed label order than that of LLMs.
Authors:Tanvi Bajpai, Eshwar Chandrasekharan
Abstract:
On Reddit, the moderation queue (modqueue) is a primary interface for moderators to review reported content. Despite its central role in Reddit's community-reliant moderation model, little is known about how moderators actually use it in practice. To address this gap, we surveyed 110 moderators, who collectively oversee more than 400 unique subreddits, and asked them about their usage of the modqueue. Modqueue practices vary widely: some moderators approach it as a daily checklist, others as a hub to infer community-wide patterns, and many still find the queue insufficient to inform their moderation decisions. We also identify persistent challenges around review coordination, inconsistent interface signals, and reliance on third-party tools. Taken together, we show the modqueue is neither a one-size-fits-all solution nor sufficient on its own for supporting moderator review. Our work highlights design opportunities for more modular, integrated, and customizable platform infrastructures that better support the diversity of moderator workflows.
Authors:Zahra Atf, Peter R Lewis
Abstract:
Large language models (LLMs) are increasingly used in high-stakes settings, where explaining uncertainty is both technical and ethical. Probabilistic methods are often opaque and misaligned with expectations of transparency. We propose a framework based on rule-based moral principles for handling uncertainty in LLM-generated text. Using insights from moral psychology and virtue ethics, we define rules such as precaution, deference, and responsibility to guide responses under epistemic or aleatoric uncertainty. These rules are encoded in a lightweight Prolog engine, where uncertainty levels (low, medium, high) trigger aligned system actions with plain-language rationales. Scenario-based simulations benchmark rule coverage, fairness, and trust calibration. Use cases in clinical and legal domains illustrate how moral reasoning can improve trust and interpretability. Our approach offers a transparent, lightweight alternative to probabilistic models for socially responsible natural language generation.
Authors:Kateryna Melnyk, Lee Friedman, Oleg Komogortsev
Abstract:
Gaze prediction is a diverse field of study with multiple research focuses and practical applications. This article investigates how recurrent neural networks and transformers perform short-term gaze prediction. We used three models: a three-layer long-short-term memory (LSTM) network, a simple transformer-encoder model (TF), and a classification-predictor network (ClPr), which simultaneously classifies the signal into eye movement events and predicts the positions of gaze. The performance of the models was evaluated for ocular fixations and saccades of various amplitudes and as a function of individual differences in both typical and extreme cases. On average, LSTM performed better on fixations and saccades, whereas TF and ClPr demonstrated more precise results for post-saccadic periods. In extreme cases, the best-performing models vary depending on the type of eye movement. We reviewed the difference between the median $P_{50}$ and high-percentile $P_{95}$ error profiles across subjects. The subjects for which the models perform the best overall do not necessarily exhibit the lowest $P_{95}$ values, which supports the idea of analyzing extreme cases separately in future work. We explore the trade-offs between the proposed solutions and provide practical insights into model selection for gaze prediction.
Authors:Yingke Ding, Zeyu Wang, Xiyuxing Zhang, Hongbin Chen, Zhenan Xu
Abstract:
Traditional hearing aids often rely on static fittings that fail to adapt to their dynamic acoustic environments. We propose CAFA, a Context-Adaptive Fitting Advisor that provides personalized, real-time hearing aid adjustments through a multi-agent Large Language Model (LLM) workflow. CAFA combines live ambient audio, audiograms, and user feedback in a multi-turn conversational system. Ambient sound is classified into conversation, noise, or quiet with 91.2\% accuracy using a lightweight neural network based on YAMNet embeddings. This system utilizes a modular LLM workflow, comprising context acquisition, subproblem classification, strategy provision, and ethical regulation, and is overseen by an LLM Judge. The workflow translates context and feedback into precise, safe tuning commands. Evaluation confirms that real-time sound classification enhances conversational efficiency. CAFA exemplifies how agentic, multimodal AI can enable intelligent, user-centric assistive technologies.
Authors:Yi Wang, Haodong Zhang, Hongqi Li
Abstract:
Motor imagery (MI) based brain-computer interfaces (BCIs) hold significant potential for assistive technologies and neurorehabilitation. However, the precise and efficient decoding of MI remains challenging due to their non-stationary nature and low signal-to-noise ratio. This paper introduces a novel end-to-end deep learning framework of Discriminative Residual Dense Convolutional Autoencoder with Spatio-Temporal Graph Neural Network (DRDCAE-STGNN) to enhance the MI feature learning and classification. Specifically, the DRDCAE module leverages residual-dense connections to learn discriminative latent representations through joint reconstruction and classifica-tion, while the STGNN module captures dynamic spatial dependencies via a learnable graph adjacency matrix and models temporal dynamics using bidirectional long short-term memory (LSTM). Extensive evaluations on BCI Competition IV 2a, 2b, and PhysioNet datasets demonstrate state-of-the-art performance, with average accuracies of 95.42%, 97.51%, and 90.15%, respectively. Ablation studies confirm the contribution of each component, and interpreta-bility analysis reveals neurophysiologically meaningful connectivity patterns. Moreover, despite its complexity, the model maintains a feasible parameter count and an inference time of 0.32 ms per sample. These results indicate that our method offers a robust, accurate, and interpretable solution for MI-EEG decoding, with strong generalizability across subjects and tasks and meeting the requirements for potential real-time BCI applications.
Authors:Lennart Luettgau, Hannah Rose Kirk, Kobi Hackenburg, Jessica Bergs, Henry Davidson, Henry Ogden, Divya Siddarth, Saffron Huang, Christopher Summerfield
Abstract:
Conversational AI systems are increasingly being used in place of traditional search engines to help users complete information-seeking tasks. This has raised concerns in the political domain, where biased or hallucinated outputs could misinform voters or distort public opinion. However, in spite of these concerns, the extent to which conversational AI is used for political information-seeking, as well the potential impact of this use on users' political knowledge, remains uncertain. Here, we address these questions: First, in a representative national survey of the UK public (N = 2,499), we find that in the week before the 2024 election as many as 32% of chatbot users - and 13% of eligible UK voters - have used conversational AI to seek political information relevant to their electoral choice. Second, in a series of randomised controlled trials (N = 2,858 total) we find that across issues, models, and prompting strategies, conversations with AI increase political knowledge (increase belief in true information and decrease belief in misinformation) to the same extent as self-directed internet search. Taken together, our results suggest that although people in the UK are increasingly turning to conversational AI for information about politics, this shift may not lead to increased public belief in political misinformation.
Authors:Xinyue Chen, Kunlin Ruan, Kexin Phyllis Ju, Nathan Yap, Xu Wang
Abstract:
As AI tools become increasingly embedded in cognitively demanding tasks such as note-taking, questions remain about whether they enhance or undermine cognitive engagement. This paper examines the "AI Assistance Dilemma" in note-taking, investigating how varying levels of AI support affect user engagement and comprehension. In a within-subject experiment, we asked participants (N=30) to take notes during lecture videos under three conditions: Automated AI (high assistance with structured notes), Intermediate AI (moderate assistance with real-time summary, and Minimal AI (low assistance with transcript). Results reveal that Intermediate AI yields the highest post-test scores and Automated AI the lowest. Participants, however, preferred the automated setup due to its perceived ease of use and lower cognitive effort, suggesting a discrepancy between preferred convenience and cognitive benefits. Our study provides insights into designing AI assistance that preserves cognitive engagement, offering implications for designing moderate AI support in cognitive tasks.
Authors:Eduard Kuric, Peter Demcak, Matus Krajcovic
Abstract:
To keep card sorting with a lot of cards concise, a common strategy for gauging mental models involves presenting participants with fewer randomly selected cards instead of the full set. This is a decades-old practice, but its effects lacked systematic examination. To assess how randomized subsets affect data, we conducted an experiment with 160 participants. We compared results between full and randomized 60\% card sets, then analyzed sample size requirements and the impacts of individual personality and cognitive factors. Our results demonstrate that randomized subsets can yield comparable similarity matrices to standard card sorting, but thematic patterns in categories can differ. Increased data variability also warrants larger sample sizes (25-35 for 60% card subset). Results indicate that personality traits and cognitive reflection interact with card sorting. Our research suggests evidence-based practices for conducting card sorting while exposing the influence of study design and individual differences on measurement of mental models.
Authors:Ratanond Koonchanok, Alex Kale, Khairi Reda
Abstract:
Recent advances in Generative AI have transformed how users interact with data analysis through natural language interfaces. However, many systems rely too heavily on LLMs, creating risks of hallucination, opaque reasoning, and reduced user control. We present a hybrid visual analysis system that integrates GenAI in a constrained, high-level role to support statistical modeling while preserving transparency and user agency. GenAI translates natural language intent into formal statistical formulations, while interactive visualizations surface model behavior, residual patterns, and hypothesis comparisons to guide iterative exploration. Model fitting, diagnostics, and hypothesis testing are delegated entirely to a structured R-based backend, ensuring correctness, interpretability, and reproducibility. By combining GenAI-assisted intent translation with visualization-driven reasoning, our approach broadens access to modeling tools without compromising rigor. We present an example use case of the tool and discuss challenges and opportunities for future research.
Authors:Sarfaroz Yunusov, Kaige Chen, Kazi Nishat Anwar, Ali Emami
Abstract:
As Large Language Models (LLMs) increasingly integrate into everyday workflows, where users shape outcomes through multi-turn collaboration, a critical question emerges: do users with different personality traits systematically prefer certain LLMs over others? We conducted a study with 32 participants evenly distributed across four Keirsey personality types, evaluating their interactions with GPT-4 and Claude 3.5 across four collaborative tasks: data analysis, creative writing, information retrieval, and writing assistance. Results revealed significant personality-driven preferences: Rationals strongly preferred GPT-4, particularly for goal-oriented tasks, while idealists favored Claude 3.5, especially for creative and analytical tasks. Other personality types showed task-dependent preferences. Sentiment analysis of qualitative feedback confirmed these patterns. Notably, aggregate helpfulness ratings were similar across models, showing how personality-based analysis reveals LLM differences that traditional evaluations miss.
Authors:B. Sankar, Dibakar Sen
Abstract:
Conventional product design is a cognitively demanding process, limited by its time-consuming nature, reliance on subjective expertise, and the opaque translation of inspiration into tangible concepts. This research introduces a novel, attention-aware framework that integrates two synergistic systems: EUPHORIA, an immersive Virtual Reality environment using eye-tracking to implicitly capture a designer's aesthetic preferences, and RETINA, an agentic AI pipeline that translates these implicit preferences into concrete design outputs. The foundational principles were validated in a two-part study. An initial study correlated user's implicit attention with explicit preference and the next one correlated mood to attention. A comparative study where 4 designers solved challenging design problems using 4 distinct workflows, from a manual process to an end-to-end automated pipeline, showed the integrated EUPHORIA-RETINA workflow was over 4 times more time-efficient than the conventional method. A panel of 50 design experts evaluated the 16 final renderings. Designs generated by the fully automated system consistently received the highest Worthiness (calculated by an inverse Plackett-Luce model based on gradient descent optimization) and Design Effectiveness scores, indicating superior quality across 8 criteria: novelty, visual appeal, emotional resonance, clarity of purpose, distinctiveness of silhouette, implied materiality, proportional balance, & adherence to the brief. This research presents a validated paradigm shift from traditional Computer-Assisted Design (CAD) to a collaborative model of Designer-Assisting Computers (DAC). By automating logistical and skill-dependent generative tasks, the proposed framework elevates the designer's role to that of a creative director, synergizing human intuition with the generative power of agentic AI to produce higher-quality designs more efficiently.
Authors:Kaushall Senthil Nathan, Jieun Lee, Derrick M. Wang, Geneva M. Smith, Eugene Kukshinov, Daniel Harley, Lennart E. Nacke
Abstract:
Teammate performance evaluation fundamentally shapes intervention design in video games. However, our current understanding stems primarily from competitive E-Sports contexts where individual performance directly impacts outcomes. This research addresses whether performance evaluation mechanisms and behavioural responses identified in competitive games generalize to casual cooperative games. We investigated how casual players evaluate teammate competence and respond behaviourally in a controlled between-subjects experiment (N=23). We manipulated confederate performance in Overcooked 2, combining observations, NASA TLX self-reports, and interviews. We present two key findings. (1) Observations revealed frustration behaviours completely absent in self-report data. Thus, these instruments assess fundamentally distinct constructs. (2) Participants consistently evaluated teammate performance through relative comparison rather than absolute metrics. This contradicts task-performance operationalizations dominant in competitive gaming research. Hence, performance evaluation frameworks from competitive contexts cannot be directly applied to casual cooperative games. We provide empirical evidence that performance evaluation in casual games requires a comparative operationalization.
Authors:Ernest Lim, Yajie Vera He, Jared Joselowitz, Kate Preston, Mohita Chowdhury, Louis Williams, Aisling Higham, Katrina Mason, Mariane Melo, Tom Lawton, Yan Jia, Ibrahim Habli
Abstract:
Despite the growing use of large language models (LLMs) in clinical dialogue systems, existing evaluations focus on task completion or fluency, offering little insight into the behavioral and risk management requirements essential for safety-critical systems. This paper presents MATRIX (Multi-Agent simulaTion fRamework for safe Interactions and conteXtual clinical conversational evaluation), a structured, extensible framework for safety-oriented evaluation of clinical dialogue agents.
MATRIX integrates three components: (1) a safety-aligned taxonomy of clinical scenarios, expected system behaviors and failure modes derived through structured safety engineering methods; (2) BehvJudge, an LLM-based evaluator for detecting safety-relevant dialogue failures, validated against expert clinician annotations; and (3) PatBot, a simulated patient agent capable of producing diverse, scenario-conditioned responses, evaluated for realism and behavioral fidelity with human factors expertise, and a patient-preference study.
Across three experiments, we show that MATRIX enables systematic, scalable safety evaluation. BehvJudge with Gemini 2.5-Pro achieves expert-level hazard detection (F1 0.96, sensitivity 0.999), outperforming clinicians in a blinded assessment of 240 dialogues. We also conducted one of the first realism analyses of LLM-based patient simulation, showing that PatBot reliably simulates realistic patient behavior in quantitative and qualitative evaluations. Using MATRIX, we demonstrate its effectiveness in benchmarking five LLM agents across 2,100 simulated dialogues spanning 14 hazard scenarios and 10 clinical domains.
MATRIX is the first framework to unify structured safety engineering with scalable, validated conversational AI evaluation, enabling regulator-aligned safety auditing. We release all evaluation tools, prompts, structured scenarios, and datasets.
Authors:Xiaolin He, Zirui Li, Xinwei Wang, Riender Happee, Meng Wang
Abstract:
Perceived risk in automated vehicles (AVs) can create the very danger that automation is meant to prevent: a frightened rider may hesitate when seconds matter, misjudge hazards, or disengage. However, measuring how perceived risk evolves in real time during driving remains challenging, leaving a gap in decoding such hidden psychological states. Here, we present a novel method to time-continuously measure and decode perceived risk. We conducted a controlled experiment where 2,164 participants viewed high-fidelity videos of common highway driving scenes and provided 141,628 discrete safety ratings. Through continuous-signal reconstruction of the discrete ratings, we obtained 236 hours of time-continuous perceived risk data - the largest perceived risk dataset to date. Leveraging this dataset, we trained deep neural networks that predict moment-by-moment perceived risk from vehicle kinematics with a mean relative error below $3\%$. Explainable AI analysis uncovers which factors determine perceived risk in real time. Our findings demonstrate a new paradigm for quantifying dynamic passenger experience and psychological constructs in real time. These findings can guide the design of AVs and other machines that operate in close proximity to people, adjusting behaviour before trust erodes, and help realise automation's benefits in transport, healthcare, and service robotics.
Authors:Joy Lai, David Black, Kelly Beaton, Bing Ye, Alex Mihailidis
Abstract:
Caregivers of people living with dementia (PLwD) experience stress when verifying whether tasks are truly completed, even with digital reminder systems. Generative AI, such as GPT-4, may help by automating task verification through follow-up questioning and decision support.
This feasibility study evaluates an AI-powered task verification system integrated with digital reminders for PLwD. It examines (1) GPT-4's ability to generate effective follow-up questions, (2) the accuracy of an AI-driven response flagging mechanism, and (3) the role of caregiver feedback in refining system adaptability. A simulated pipeline was tested on 64 anonymized reminders. GPT-4 generated follow-up questions with and without contextual information about PLwD routines. Responses were classified into High, Medium, or Low concern, and simulated caregiver feedback was used to refine outputs.
Results show that contextual information and caregiver input improved the clarity and relevance of AI-generated questions. The flagging system accurately identified concerns, particularly for safety-critical tasks, though subjective or non-urgent tasks remained challenging. Findings demonstrate the feasibility of AI-assisted task verification in dementia care. Context-aware AI prompts and caregiver feedback can enhance task monitoring, reduce caregiver stress, and strengthen PLwD support. Future work should focus on real-world validation and scalability.
Authors:Deep Anil Patel, Iain Melvin, Christopher Malon, Martin Renqiang Min
Abstract:
Large Language Models (LLMs) have demonstrated remarkable capabilities in understanding and generating human-like text, yet they largely operate as reactive agents, responding only when directly prompted. This passivity creates an "awareness gap," limiting their potential as truly collaborative partners in dynamic human discussions. We introduce $\textit{DiscussLLM}$, a framework designed to bridge this gap by training models to proactively decide not just $\textit{what}$ to say, but critically, $\textit{when}$ to speak. Our primary contribution is a scalable two-stage data generation pipeline that synthesizes a large-scale dataset of realistic multi-turn human discussions. Each discussion is annotated with one of five intervention types (e.g., Factual Correction, Concept Definition) and contains an explicit conversational trigger where an AI intervention adds value. By training models to predict a special silent token when no intervention is needed, they learn to remain quiet until a helpful contribution can be made. We explore two architectural baselines: an integrated end-to-end model and a decoupled classifier-generator system optimized for low-latency inference. We evaluate these models on their ability to accurately time interventions and generate helpful responses, paving the way for more situationally aware and proactive conversational AI.
Authors:Arran Zeyu Wang, David Borland, David Gotz
Abstract:
The increasing capture and analysis of large-scale longitudinal health data offer opportunities to improve healthcare and advance medical understanding. However, a critical gap exists between (a) -- the observation of patterns and correlations, versus (b) -- the understanding of true causal mechanisms that drive outcomes. An accurate understanding of the underlying mechanisms that cause various changes in medical status is crucial for decision-makers across various healthcare domains and roles, yet inferring causality from real-world observational data is difficult for both methodological and practical challenges. This Grand Challenge advocates increased Visual Analytics (VA) research on this topic to empower people with the tool for sound causal reasoning from health data. We note this is complicated by the complex nature of medical data -- the volume, variety, sparsity, and temporality of health data streams make the use of causal inference algorithms difficult. Combined with challenges imposed by the realities of health-focused settings, including time constraints and traditional medical work practices, existing causal reasoning approaches are valuable but insufficient. We argue that advances in research can lead to new VA tools that augment human expertise with intuitive and robust causal inference capabilities, which can help realize a new paradigm of data-driven, causality-aware healthcare practices that improve human health outcomes.
Authors:Sameha AlShakhsi, Ala Yankouskaya, Magnus Liebherr, Raian Ali
Abstract:
There is an urgent need for reliable, culturally validated instruments to assess psychological responses to AI in general and large language models (LLMs). This need is global issue, but it is especially urgent among Arabic-speaking populations, where AI and LLMs adoption is accelerating, yet psychometric tools remain limited. This study presents the first validation of the LLM-D12, a dual-dimensional scale assessing Instrumental and Relationship Dependency on LLMs, in an Arab sample. A total of 250 Arab participants completed the Arabic version of the LLM-D12. Confirmatory Factor Analysis confirms the original 2-factor structure of LLM-D12 with all items showing good loading of corresponding Instrumental and Relationship Dependency. The scale showed good to excellent internal reliability (Cronbach alpha is 0.90 for Total, 0.85 for Instrumental Dependency, and 0.90 for Relationship Dependency). External validation revealed that Instrumental Dependency was positively associated with AI acceptance and internet addiction, while Relationship Dependency was linked to lower need for cognition and greater trustworthiness of LLM, demonstrating sensitivity of this instrument to different use and personal factors. These findings confirm that Arabic LLM-D12 is a psychometrically sound, culturally appropriate instrument, offering a necessary tool for research, education, and policy concerning AI and LLMs engagement in Arab contexts.
Authors:Mosh Levy, Zohar Elyoseph, Yoav Goldberg
Abstract:
A new generation of AI models generates step-by-step reasoning text before producing an answer. This text appears to offer a human-readable window into their computation process, and is increasingly relied upon for transparency and interpretability. However, it is unclear whether human understanding of this text matches the model's actual computational process. In this paper, we investigate a necessary condition for correspondence: the ability of humans to identify which steps in a reasoning text causally influence later steps. We evaluated humans on this ability by composing questions based on counterfactual measurements and found a significant discrepancy: participant accuracy was only 29%, barely above chance (25%), and remained low (42%) even when evaluating the majority vote on questions with high agreement. Our results reveal a fundamental gap between how humans interpret reasoning texts and how models use it, challenging its utility as a simple interpretability tool. We argue that reasoning texts should be treated as an artifact to be investigated, not taken at face value, and that understanding the non-human ways these models use language is a critical research direction.
Authors:Hannah Selder, Florian Fischer, Per Ola Kristensson, Arthur Fleig
Abstract:
Designing effective reward functions is critical for reinforcement learning-based biomechanical simulations, yet HCI researchers and practitioners often waste (computation) time with unintuitive trial-and-error tuning. This paper demystifies reward function design by systematically analyzing the impact of effort minimization, task completion bonuses, and target proximity incentives on typical HCI tasks such as pointing, tracking, and choice reaction. We show that proximity incentives are essential for guiding movement, while completion bonuses ensure task success. Effort terms, though optional, help refine motion regularity when appropriately scaled. We perform an extensive analysis of how sensitive task success and completion time depend on the weights of these three reward components. From these results we derive practical guidelines to create plausible biomechanical simulations without the need for reinforcement learning expertise, which we then validate on remote control and keyboard typing tasks. This paper advances simulation-based interaction design and evaluation in HCI by improving the efficiency and applicability of biomechanical user modeling for real-world interface development.
Authors:Hongqi Li, Yitong Chen, Yujuan Wang, Weihang Ni, Haodong Zhang
Abstract:
Electroencephalography (EEG) analysis stands at the forefront of neuroscience and artificial intelligence research, where foundation models are reshaping the traditional EEG analysis paradigm by leveraging their powerful representational capacity and cross-modal generalization. However, the rapid proliferation of these techniques has led to a fragmented research landscape, characterized by diverse model roles, inconsistent architectures, and a lack of systematic categorization. To bridge this gap, this study presents the first comprehensive modality-oriented taxonomy for foundation models in EEG analysis, systematically organizing research advances based on output modalities of the native EEG decoding, EEG-text, EEG-vision, EEG-audio, and broader multimodal frameworks. We rigorously analyze each category's research ideas, theoretical foundations, and architectural innovations, while highlighting open challenges such as model interpretability, cross-domain generalization, and real-world applicability in EEG-based systems. By unifying this dispersed field, our work not only provides a reference framework for future methodology development but accelerates the translation of EEG foundation models into scalable, interpretable, and online actionable solutions.
Authors:Yuansong Xu, Shuhao Zhang, Yijie Fan, Shaohan Shi, Zhenhui Peng, Quan Li
Abstract:
Effectively assimilating and integrating reviewer feedback is crucial for researchers seeking to refine their papers and handle potential rebuttal phases in academic venues. However, traditional review digestion processes present challenges such as time consumption, reading fatigue, and the requisite for comprehensive analytical skills. Prior research on review analysis often provides theoretical guidance with limited targeted support. Additionally, general text comprehension tools overlook the intricate nature of comprehensively understanding reviews and lack contextual assistance. To bridge this gap, we formulated research questions to explore the authors' concerns and methods for enhancing comprehension during the review digestion phase. Through interviews and the creation of storyboards, we developed ReviseMate, an interactive system designed to address the identified challenges. A controlled user study (N=31) demonstrated the superiority of ReviseMate over baseline methods, with positive feedback regarding user interaction. Subsequent field deployment (N=6) further validated the effectiveness of ReviseMate in real-world review digestion scenarios. These findings underscore the potential of interactive tools to significantly enhance the assimilation and integration of reviewer feedback during the manuscript review process.
Authors:Longfei Chen, Shenghan Gao, Shiwei Wang, Ken Lin, Yun Wang, Quan Li
Abstract:
Conversational user interfaces powered by large language models (LLMs) have significantly lowered the technical barriers to database querying. However, existing tools still encounter several challenges, such as misinterpretation of user intent, generation of hallucinated content, and the absence of effective mechanisms for human feedback-all of which undermine their reliability and practical utility. To address these issues and promote a more transparent and controllable querying experience, we proposed QueryGenie, an interactive system that enables users to monitor, understand, and guide the LLM-driven query generation process. Through incremental reasoning, real-time validation, and responsive interaction mechanisms, users can iteratively refine query logic and ensure alignment with their intent.
Authors:Veronica Ruozzi, Sasan Matinfar, Laura Schütz, Benedikt Wiestler, Alberto Redaelli, Emiliano Votta, Nassir Navab
Abstract:
Perceptualizing tool interactions with deformable structures in surgical procedures remains challenging, as unimodal visualization techniques often fail to capture the complexity of these interactions due to constraints such as occlusion and limited depth perception. This paper presents a novel approach to augment tool navigation in mixed reality environments by providing auditory representations of tool-tissue dynamics, particularly for interactions with soft tissue. BioSonix, a physics-informed design framework, utilizes tissue displacements in 3D space to compute excitation forces for a sound model encoding tissue properties such as stiffness and density. Biomechanical simulations were employed to model particle displacements resulting from tool-tissue interactions, establishing a robust foundation for the method. An optimization approach was used to define configurations for capturing diverse interaction scenarios with varying tool trajectories. Experiments were conducted to validate the accuracy of the sound-displacement mappings. Additionally, two user studies were performed: the first involved two clinical professionals (a neuroradiologist and a cardiologist), who confirmed the method's impact and achieved high task accuracy; the second included 22 biomedical experts, who demonstrated high discrimination accuracy in tissue differentiation and targeting tasks. The results revealed a strong correlation between tool-tissue dynamics and their corresponding auditory profiles, highlighting the potential of these sound representations to enhance the intuitive understanding of complex interactions.
Authors:Arthur Fleig, Florian Fischer, Markus Klar, Patrick Ebel, Miroslav Bachinski, Per Ola Kristensson, Roderick Murray-Smith, Antti Oulasvirta
Abstract:
Computational models of how users perceive and act within a virtual or physical environment offer enormous potential for the understanding and design of user interactions. Cognition models have been used to understand the role of attention and individual preferences and beliefs on human decision making during interaction, while biomechanical simulations have been successfully applied to analyse and predict physical effort, fatigue, and discomfort. The next frontier in HCI lies in connecting these models to enable robust, diverse, and representative simulations of different user groups. These embodied user simulations could predict user intents, strategies, and movements during interaction more accurately, benchmark interfaces and interaction techniques in terms of performance and ergonomics, and guide adaptive system design. This UIST workshop explores ideas for integrating computational models into HCI and discusses use cases such as UI/UX design, automated system testing, and personalised adaptive interfaces. It brings researchers from relevant disciplines together to identify key opportunities and challenges as well as feasible next steps for bridging mind and motion to simulate interactive user behaviour.
Authors:Hongtao Liu, Zhicheng Du, Zihe Wang, Weiran Shen
Abstract:
Game-playing ability serves as an indicator for evaluating the strategic reasoning capability of large language models (LLMs). While most existing studies rely on utility performance metrics, which are not robust enough due to variations in opponent behavior and game structure. To address this limitation, we propose \textbf{Cognitive Hierarchy Benchmark (CHBench)}, a novel evaluation framework inspired by the cognitive hierarchy models from behavioral economics. We hypothesize that agents have bounded rationality -- different agents behave at varying reasoning depths/levels. We evaluate LLMs' strategic reasoning through a three-phase systematic framework, utilizing behavioral data from six state-of-the-art LLMs across fifteen carefully selected normal-form games. Experiments show that LLMs exhibit consistent strategic reasoning levels across diverse opponents, confirming the framework's robustness and generalization capability. We also analyze the effects of two key mechanisms (Chat Mechanism and Memory Mechanism) on strategic reasoning performance. Results indicate that the Chat Mechanism significantly degrades strategic reasoning, whereas the Memory Mechanism enhances it. These insights position CHBench as a promising tool for evaluating LLM capabilities, with significant potential for future research and practical applications.
Authors:Marjan Naghshbandi, Sharon Ferguson, Alison Olechowski
Abstract:
While the unique challenges of hybrid work can compromise collaboration and team dynamics, hybrid teams can thrive with well-informed strategies and tools that nurture interpersonal engagements. To inform future supports, we pursue a mixed-methods study of hybrid engineering design capstone teams' Psychological Safety (PS) (i.e., their climate of interpersonal risk-taking and mutual respect) to understand how the construct manifests in teams engaged in innovation. Using interviews, we study six teams' perceptions of PS indicators and how they present differently on Slack (when compared to in-person interactions). We then leverage the interview insights to design Slack-based PS indicators. We present five broad facets of PS in hybrid teams, four perceived differences of PS on Slack compared to in-person, and 15 Slack-based, PS indicators--the groundwork for future automated PS measurement on instant-messaging platforms. These insights produce three design implications and illustrative design examples for ways instant-messaging platforms can support Psychologically Safe hybrid teams, and best practices for hybrid teams to support interpersonal risk-taking and build mutual respect.
Authors:Denmar Mojan Gonzales, Snehanjali Kalamkar, Sophie Jörg, Jens Grubert
Abstract:
When communicating with embodied conversational agents (ECAs) in virtual reality, there might be delays in the responses of the agents lasting several seconds, for example, due to more extensive computations of the answers when large language models are used. Such delays might lead to unnatural or frustrating interactions. In this paper, we investigate filler types to mitigate these effects and lead to a more positive experience and perception of the agent. In a within-subject study, we asked 24 participants to communicate with ECAs in virtual reality, comparing four strategies displayed during the delays: a multimodal behavioral filler consisting of conversational and gestural fillers, a base condition with only idle motions, and two symbolic indicators with progress bars, one embedded as a badge on the agent, the other one external and visualized as a thinking bubble. Our results indicate that the behavioral filler improved perceived response time, three subscales of presence, humanlikeness, and naturalness. Participants looked away from the face more often when symbolic indicators were displayed, but the visualizations did not lead to a more positive impression of the agent or to increased presence. The majority of participants preferred the behavioral fillers, only 12.5% and 4.2% favored the symbolic embedded and external conditions, respectively.
Authors:Ashwin Kumar, Sanket Shah, Meghna Lowalekar, Pradeep Varakantham, Alvitta Ottley, William Yeoh
Abstract:
There is growing interest in algorithms that match passengers with drivers in ride-sharing problems and their fairness for the different parties involved (passengers, drivers, and ride-sharing companies). Researchers have proposed various fairness metrics for matching algorithms, but it is often unclear how one should balance the various parties' fairness, given that they are often in conflict. We present FairVizARD, a visualization-based system that aids users in evaluating the fairness of ride-sharing matching algorithms. FairVizARD presents the algorithms' results by visualizing relevant spatio-temporal information using animation and aggregated information in charts. FairVizARD also employs efficient techniques for visualizing a large amount of information in a user friendly manner, which makes it suitable for real-world settings. We conduct our experiments on a real-world large-scale taxi dataset and, through user studies and an expert interview, we show how users can use FairVizARD not only to evaluate the fairness of matching algorithms but also to expand on their notions of fairness.
Authors:Junyeon Kim, Tianshu Ruan, Cesar Alan Contreras, Manolis Chiou
Abstract:
Structural inspection in nuclear facilities is vital for maintaining operational safety and integrity. Traditional methods of manual inspection pose significant challenges, including safety risks, high cognitive demands, and potential inaccuracies due to human limitations. Recent advancements in Artificial Intelligence (AI) and robotic technologies have opened new possibilities for safer, more efficient, and accurate inspection methodologies. Specifically, Human-Robot Collaboration (HRC), leveraging robotic platforms equipped with advanced detection algorithms, promises significant improvements in inspection outcomes and reductions in human workload. This study explores the effectiveness of AI-assisted visual crack detection integrated into a mobile Jackal robot platform. The experiment results indicate that HRC enhances inspection accuracy and reduces operator workload, resulting in potential superior performance outcomes compared to traditional manual methods.
Authors:Mithat Can Ozgun, Jiahuan Pei, Koen Hindriks, Lucia Donatelli, Qingzhi Liu, Junxiao Wang
Abstract:
LLM-based agents have emerged as transformative tools capable of executing complex tasks through iterative planning and action, achieving significant advancements in understanding and addressing user needs. Yet, their effectiveness remains limited in specialized domains such as mental health diagnosis, where they underperform compared to general applications. Current approaches to integrating diagnostic capabilities into LLMs rely on scarce, highly sensitive mental health datasets, which are challenging to acquire. These methods also fail to emulate clinicians' proactive inquiry skills, lack multi-turn conversational comprehension, and struggle to align outputs with expert clinical reasoning. To address these gaps, we propose DSM5AgentFlow, the first LLM-based agent workflow designed to autonomously generate DSM-5 Level-1 diagnostic questionnaires. By simulating therapist-client dialogues with specific client profiles, the framework delivers transparent, step-by-step disorder predictions, producing explainable and trustworthy results. This workflow serves as a complementary tool for mental health diagnosis, ensuring adherence to ethical and legal standards. Through comprehensive experiments, we evaluate leading LLMs across three critical dimensions: conversational realism, diagnostic accuracy, and explainability. Our datasets and implementations are fully open-sourced.
Authors:Francesco Sovrano, Gabriele Dominici, Rita Sevastjanova, Alessandra Stramiglio, Alberto Bacchelli
Abstract:
Human cognitive biases in software engineering can lead to costly errors. While general-purpose AI (GPAI) systems may help mitigate these biases due to their non-human nature, their training on human-generated data raises a critical question: Do GPAI systems themselves exhibit cognitive biases?
To investigate this, we present the first dynamic benchmarking framework to evaluate data-induced cognitive biases in GPAI within software engineering workflows. Starting with a seed set of 16 hand-crafted realistic tasks, each featuring one of 8 cognitive biases (e.g., anchoring, framing) and corresponding unbiased variants, we test whether bias-inducing linguistic cues unrelated to task logic can lead GPAI systems from correct to incorrect conclusions.
To scale the benchmark and ensure realism, we develop an on-demand augmentation pipeline relying on GPAI systems to generate task variants that preserve bias-inducing cues while varying surface details. This pipeline ensures correctness (88--99% on average, according to human evaluation), promotes diversity, and controls reasoning complexity by leveraging Prolog-based reasoning and LLM-as-a-judge validation. It also verifies that the embedded biases are both harmful and undetectable by logic-based, unbiased reasoners.
We evaluate leading GPAI systems (GPT, LLaMA, DeepSeek) and find a consistent tendency to rely on shallow linguistic heuristics over deep reasoning. All systems exhibit cognitive biases (ranging from 5.9% to 35% across types), with bias sensitivity increasing sharply with task complexity (up to 49%), highlighting critical risks in real-world software engineering deployments.
Authors:Samantha Aziz, Oleg Komogortsev
Abstract:
We present a privacy-enhancing mechanism for gaze signals using a latent-noise autoencoder that prevents users from being re-identified across play sessions without their consent, while retaining the usability of the data for benign tasks. We evaluate privacy-utility trade-offs across biometric identification and gaze prediction tasks, showing that our approach significantly reduces biometric identifiability with minimal utility degradation. Unlike prior methods in this direction, our framework retains physiologically plausible gaze patterns suitable for downstream use, which produces favorable privacy-utility trade-off. This work advances privacy in gaze-based systems by providing a usable and effective mechanism for protecting sensitive gaze data.
Authors:Lisa Haxel, Jaivardhan Kapoor, Ulf Ziemann, Jakob H. Macke
Abstract:
Brain-computer interfaces (BCIs) suffer from accuracy degradation as neural signals drift over time and vary across users, requiring frequent recalibration that limits practical deployment. We introduce EDAPT, a task- and model-agnostic framework that eliminates calibration through continual model adaptation. EDAPT first trains a baseline decoder using data from multiple users, then continually personalizes this model via supervised finetuning as the neural patterns evolve during use. We tested EDAPT across nine datasets covering three BCI tasks, and found that it consistently improved accuracy over conventional, static methods. These improvements primarily stem from combining population-level pretraining and online continual finetuning, with unsupervised domain adaptation providing further gains on some datasets. EDAPT runs efficiently, updating models within 200 milliseconds on consumer-grade hardware. Finally, decoding accuracy scales with total data budget rather than its allocation between subjects and trials. EDAPT provides a practical pathway toward calibration-free BCIs, reducing a major barrier to BCI deployment.
Authors:Mahdi Dhaini, Tobias Müller, Roksoliana Rabets, Gjergji Kasneci
Abstract:
The field of explainable natural language processing (NLP) has grown rapidly in recent years. The growing opacity of complex models calls for transparency and explanations of their decisions, which is crucial to understand their reasoning and facilitate deployment, especially in high-stakes environments. Despite increasing attention given to explainable NLP, practitioners' perspectives regarding its practical adoption and effectiveness remain underexplored. This paper addresses this research gap by investigating practitioners' experiences with explainability methods, specifically focusing on their motivations for adopting such methods, the techniques employed, satisfaction levels, and the practical challenges encountered in real-world NLP applications. Through a qualitative interview-based study with industry practitioners and complementary interviews with academic researchers, we systematically analyze and compare their perspectives. Our findings reveal conceptual gaps, low satisfaction with current explainability methods, and highlight evaluation challenges. Our findings emphasize the need for clear definitions and user-centric frameworks for better adoption of explainable NLP in practice.
Authors:Ruchira Dhar, Stephanie Brandl, Ninell Oldenburg, Anders Søgaard
Abstract:
The field of Explainable AI (XAI) offers a wide range of techniques for making complex models interpretable. Yet, in practice, generating meaningful explanations is a context-dependent task that requires intentional design choices to ensure accessibility and transparency. This paper reframes explanation as a situated design process -- an approach particularly relevant for practitioners involved in building and deploying explainable systems. Drawing on prior research and principles from design thinking, we propose a three-part framework for explanation design in XAI: asking Who needs the explanation, What they need explained, and How that explanation should be delivered. We also emphasize the need for ethical considerations, including risks of epistemic inequality, reinforcing social inequities, and obscuring accountability and governance. By treating explanation as a sociotechnical design process, this framework encourages a context-aware approach to XAI that supports effective communication and the development of ethically responsible explanations.
Authors:Tina Behzad, Nikolos Gurney, Ning Wang, David V. Pynadath
Abstract:
The promise of human-AI teaming lies in humans and AI working together to achieve performance levels neither could accomplish alone. Effective communication between AI and humans is crucial for teamwork, enabling users to efficiently benefit from AI assistance. This paper investigates how AI communication impacts human-AI team performance. We examine AI explanations that convey an awareness of its strengths and limitations. To achieve this, we train a decision tree on the model's mistakes, allowing it to recognize and explain where and why it might err. Through a user study on an income prediction task, we assess the impact of varying levels of information and explanations about AI predictions. Our results show that AI performance insights enhance task performance, and conveying AI awareness of its strengths and weaknesses improves trust calibration. These findings highlight the importance of considering how information delivery influences user trust and reliance in AI-assisted decision-making.
Authors:Galen Weld, Carl Pearson, Bradley Spahn, Tim Althoff, Amy X. Zhang, Sanjay Kairam
Abstract:
Sense of Community (SOC) is vital to individual and collective well-being. Although social interactions have moved increasingly online, still little is known about the specific relationships between the nature of these interactions and Sense of Virtual Community (SOVC). This study addresses this gap by exploring how conversational structure and linguistic style predict SOVC in online communities, using a large-scale survey of 2,826 Reddit users across 281 varied subreddits. We develop a hierarchical model to predict self-reported SOVC based on automatically quantifiable and highly generalizable features that are agnostic to community topic and that describe both individual users and entire communities. We identify specific interaction patterns (e.g., reciprocal reply chains, use of prosocial language) associated with stronger communities and identify three primary dimensions of SOVC within Reddit -- Membership & Belonging, Cooperation & Shared Values, and Connection & Influence. This study provides the first quantitative evidence linking patterns of social interaction to SOVC and highlights actionable strategies for fostering stronger community attachment, using an approach that can generalize readily across community topics, languages, and platforms. These insights offer theoretical implications for the study of online communities and practical suggestions for the design of features to help more individuals experience the positive benefits of online community participation.
Authors:Vaibhav Gupta, Maria Maleshkova
Abstract:
Recent advances in wearable technology have enabled the continuous monitoring of vital physiological signals, essential for predictive modeling and early detection of extreme physiological events. Among these physiological signals, heart rate (HR) plays a central role, as it is widely used in monitoring and managing cardiovascular conditions and detecting extreme physiological events such as hypoglycemia. However, data from wearable devices often suffer from missing values. To address this issue, recent studies have employed various imputation techniques. Traditionally, the effectiveness of these methods has been evaluated using predictive accuracy metrics such as RMSE, MAPE, and MAE, which assess numerical proximity to the original data. While informative, these metrics fail to capture the complex statistical structure inherent in physiological signals. This study bridges this gap by presenting a comprehensive evaluation of four statistical imputation methods, linear interpolation, K Nearest Neighbors (KNN), Piecewise Cubic Hermite Interpolating Polynomial (PCHIP), and B splines, for short term HR data gaps. We assess their performance using both predictive accuracy metrics and statistical distance measures, including the Cohen Distance Test (CDT) and Jensen Shannon Distance (JS Distance), applied to HR data from the D1NAMO dataset and the BIG IDEAs Lab Glycemic Variability and Wearable Device dataset. The analysis reveals limitations in existing imputation approaches and the absence of a robust framework for evaluating imputation quality in physiological signals. Finally, this study proposes a foundational framework to develop a composite evaluation metric to assess imputation performance.
Authors:Yuya Kawakami, Daniel Cayan, Dongyu Liu, Kwan-Liu Ma
Abstract:
Ensemble datasets are ever more prevalent in various scientific domains. In climate science, ensemble datasets are used to capture variability in projections under plausible future conditions including greenhouse and aerosol emissions. Each ensemble model run produces projections that are fundamentally similar yet meaningfully distinct. Understanding this variability among ensemble model runs and analyzing its magnitude and patterns is a vital task for climate scientists. In this paper, we present ClimateSOM, a visual analysis workflow that leverages a self-organizing map (SOM) and Large Language Models (LLMs) to support interactive exploration and interpretation of climate ensemble datasets. The workflow abstracts climate ensemble model runs - spatiotemporal time series - into a distribution over a 2D space that captures the variability among the ensemble model runs using a SOM. LLMs are integrated to assist in sensemaking of this SOM-defined 2D space, the basis for the visual analysis tasks. In all, ClimateSOM enables users to explore the variability among ensemble model runs, identify patterns, compare and cluster the ensemble model runs. To demonstrate the utility of ClimateSOM, we apply the workflow to an ensemble dataset of precipitation projections over California and the Northwestern United States. Furthermore, we conduct a short evaluation of our LLM integration, and conduct an expert review of the visual workflow and the insights from the case studies with six domain experts to evaluate our approach and its utility.
Authors:Sizhe Cheng, Jiaping Li, Huanchen Wang, Yuxin Ma
Abstract:
Retrieval-Augmented Generation (RAG) systems have emerged as a promising solution to enhance large language models (LLMs) by integrating external knowledge retrieval with generative capabilities. While significant advancements have been made in improving retrieval accuracy and response quality, a critical challenge remains that the internal knowledge integration and retrieval-generation interactions in RAG workflows are largely opaque. This paper introduces RAGTrace, an interactive evaluation system designed to analyze retrieval and generation dynamics in RAG-based workflows. Informed by a comprehensive literature review and expert interviews, the system supports a multi-level analysis approach, ranging from high-level performance evaluation to fine-grained examination of retrieval relevance, generation fidelity, and cross-component interactions. Unlike conventional evaluation practices that focus on isolated retrieval or generation quality assessments, RAGTrace enables an integrated exploration of retrieval-generation relationships, allowing users to trace knowledge sources and identify potential failure cases. The system's workflow allows users to build, evaluate, and iterate on retrieval processes tailored to their specific domains of interest. The effectiveness of the system is demonstrated through case studies and expert evaluations on real-world RAG applications.
Authors:Yifan Wang, Hongfeng Ai, Ruiqi Li, Maowei Jiang, Ruiyuan Kang, Jiahua Dong, Cheng Jiang, Chenzhong Li
Abstract:
In medical time series disease diagnosis, two key challenges are identified. First, the high annotation cost of medical data leads to overfitting in models trained on label-limited, single-center datasets. To address this, we propose incorporating external data from related tasks and leveraging AE-GAN to extract prior knowledge, providing valuable references for downstream tasks. Second, many existing studies employ contrastive learning to derive more generalized medical sequence representations for diagnostic tasks, usually relying on manually designed diverse positive and negative sample pairs. However, these approaches are complex, lack generalizability, and fail to adaptively capture disease-specific features across different conditions. To overcome this, we introduce LMCF (Learnable Multi-views Contrastive Framework), a framework that integrates a multi-head attention mechanism and adaptively learns representations from different views through inter-view and intra-view contrastive learning strategies. Additionally, the pre-trained AE-GAN is used to reconstruct discrepancies in the target data as disease probabilities, which are then integrated into the contrastive learning process. Experiments on three target datasets demonstrate that our method consistently outperforms other seven baselines, highlighting its significant impact on healthcare applications such as the diagnosis of myocardial infarction, Alzheimer's disease, and Parkinson's disease. We release the source code at xxxxx.
Authors:Jelle Luijkx, Zlatan AjanoviÄ, Laura Ferranti, Jens Kober
Abstract:
Human teaching effort is a significant bottleneck for the broader applicability of interactive imitation learning. To reduce the number of required queries, existing methods employ active learning to query the human teacher only in uncertain, risky, or novel situations. However, during these queries, the novice's planned actions are not utilized despite containing valuable information, such as the novice's capabilities, as well as corresponding uncertainty levels. To this end, we allow the novice to say: "I plan to do this, but I am uncertain." We introduce the Active Skill-level Data Aggregation (ASkDAgger) framework, which leverages teacher feedback on the novice plan in three key ways: (1) S-Aware Gating (SAG): Adjusts the gating threshold to track sensitivity, specificity, or a minimum success rate; (2) Foresight Interactive Experience Replay (FIER), which recasts valid and relabeled novice action plans into demonstrations; and (3) Prioritized Interactive Experience Replay (PIER), which prioritizes replay based on uncertainty, novice success, and demonstration age. Together, these components balance query frequency with failure incidence, reduce the number of required demonstration annotations, improve generalization, and speed up adaptation to changing domains. We validate the effectiveness of ASkDAgger through language-conditioned manipulation tasks in both simulation and real-world environments. Code, data, and videos are available at https://askdagger.github.io.
Authors:Zhehan Qu, Tianyi Hu, Christian Fronk, Maria Gorlatova
Abstract:
Augmented Reality (AR) systems, while enhancing task performance through real-time guidance, pose risks of inducing cognitive tunneling-a hyperfocus on virtual content that compromises situational awareness (SA) in safety-critical scenarios. This paper investigates SA in AR-guided cardiopulmonary resuscitation (CPR), where responders must balance effective compressions with vigilance to unpredictable hazards (e.g., patient vomiting). We developed an AR app on a Magic Leap 2 that overlays real-time CPR feedback (compression depth and rate) and conducted a user study with simulated unexpected incidents (e.g., bleeding) to evaluate SA, in which SA metrics were collected via observation and questionnaires administered during freeze-probe events. Eye tracking analysis revealed that higher SA levels were associated with greater saccadic amplitude and velocity, and with reduced proportion and frequency of fixations on virtual content. To predict SA, we propose FixGraphPool, a graph neural network that structures gaze events (fixations, saccades) into spatiotemporal graphs, effectively capturing dynamic attentional patterns. Our model achieved 83.0% accuracy (F1=81.0%), outperforming feature-based machine learning and state-of-the-art time-series models by leveraging domain knowledge and spatial-temporal information encoded in ET data. These findings demonstrate the potential of eye tracking for SA modeling in AR and highlight its utility in designing AR systems that ensure user safety and situational awareness.
Authors:Oliver Huang, Carolina Nobre
Abstract:
Analysts increasingly explore data through evolving, narrative-driven inquiries, moving beyond static dashboards and predefined metrics as their questions deepen and shift. As these explorations progress, insights often become dispersed across views, making it challenging to maintain context or clarify how conclusions arise. Through a formative study with 48 participants, we identify key barriers that hinder narrative-driven exploration, including difficulty maintaining context across views, tracing reasoning paths, and externalizing evolving interpretations. Our findings surface design opportunities to support narrative-driven analysis better.
Authors:Duc-An Nguyen, Clara Colombatto, Steve Fleming, Ingmar Posner, Nick Hawes, Raunak Bhattacharyya
Abstract:
Joint human-AI inference holds immense potential to improve outcomes in human-supervised robot missions. Current day missions are generally in the AI-assisted setting, where the human operator makes the final inference based on the AI recommendation. However, due to failures in human judgement on when to accept or reject the AI recommendation, complementarity is rarely achieved. We investigate joint human-AI inference where the inference made with higher confidence is selected. Through a user study with N=100 participants on a representative simulated robot teleoperation task, specifically studying the inference of robots' control delays we show that: a) Joint inference accuracy is higher and its extent is regulated by the confidence calibration of the AI agent, and b) Humans change their inferences based on AI recommendations and the extent and direction of this change is also regulated by the confidence calibration of the AI agent. Interestingly, our results show that pairing poorly-calibrated AI-DSS with humans hurts performance instead of helping the team, reiterating the need for AI-based decision support systems with good metacognitive sensitivity. To the best of our knowledge, our study presents the first application of a maximum-confidence-based heuristic for joint human-AI inference within a simulated robot teleoperation task.
Authors:Ramaswamy Palaniappan, Surej Mouli, Howard Bowman, Ian McLoughlin
Abstract:
Half of all road accidents result from either lack of driver attention or from maintaining insufficient separation between vehicles. Collision from the rear, in particular, has been identified as the most common class of accident in the UK, and its influencing factors have been widely studied for many years. Rear-mounted stop lamps, illuminated when braking, are the primary mechanism to alert following drivers to the need to reduce speed or brake. This paper develops a novel brain response approach to measuring subject reaction to different brake light designs. A variety of off-the-shelf brake light assemblies are tested in a physical simulated driving environment to assess the cognitive reaction times of 22 subjects. Eight pairs of LED-based and two pairs of incandescent bulb-based brake light assemblies are used and electroencephalogram (EEG) data recorded. Channel Pz is utilised to extract the P3 component evoked during the decision making process that occurs in the brain when a participant decides to lift their foot from the accelerator and depress the brake. EEG analysis shows that both incandescent bulb-based lights are statistically slower to evoke cognitive responses than all tested LED-based lights. Between the LED designs, differences are evident, but not statistically significant, attributed to the significant amount of movement artifact in the EEG signal.
Authors:Sangho Suh, Michael Lai, Kevin Pu, Steven P. Dow, Tovi Grossman
Abstract:
Design processes involve exploration, iteration, and movement across interconnected stages such as persona creation, problem framing, solution ideation, and prototyping. However, time and resource constraints often hinder designers from exploring broadly, collecting feedback, and revisiting earlier assumptions-making it difficult to uphold core design principles in practice. To better understand these challenges, we conducted a formative study with 15 participants-comprised of UX practitioners, students, and instructors. Based on the findings, we developed StoryEnsemble, a tool that integrates AI into a node-link interface and leverages forward and backward propagation to support dynamic exploration and iteration across the design process. A user study with 10 participants showed that StoryEnsemble enables rapid, multi-directional iteration and flexible navigation across design stages. This work advances our understanding of how AI can foster more iterative design practices by introducing novel interactions that make exploration and iteration more fluid, accessible, and engaging.
Authors:Nilesh Kumar Sahu, Aditya Sneh, Snehil Gupta, Haroon R Lone
Abstract:
The rise of mobile health (mHealth) technologies has enabled real-time monitoring and intervention for mental health conditions using passively sensed smartphone data. Building on these capabilities, Just-in-Time Adaptive Interventions (JITAIs) seek to deliver personalized support at opportune moments, adapting to users' evolving contexts and needs. Although prior research has examined how context affects user responses to generic notifications and general mHealth messages, relatively little work has explored its influence on engagement with actual mental health interventions. Furthermore, while much of the existing research has focused on detecting when users might benefit from an intervention, less attention has been paid to understanding receptivity, i.e., users' willingness and ability to engage with and act upon the intervention.
In this study, we investigate user receptivity through two components: acceptance(acknowledging or engaging with a prompt) and feasibility (ability to act given situational constraints). We conducted a two-week in-the-wild study with 70 students using a custom Android app, LogMe, which collected passive sensor data and active context reports to prompt mental health interventions. The adaptive intervention module was built using Thompson Sampling, a reinforcement learning algorithm. We address four research questions relating smartphone features and self-reported contexts to acceptance and feasibility, and examine whether an adaptive reinforcement learning approach can optimize intervention delivery by maximizing a combined receptivity reward. Our results show that several types of passively sensed data significantly influenced user receptivity to interventions. Our findings contribute insights into the design of context-aware, adaptive interventions that are not only timely but also actionable in real-world settings.
Authors:Matus Krajcovic, Peter Demcak, Eduard Kuric
Abstract:
Embodied conversational agents (ECAs) are increasingly more realistic and capable of dynamic conversations. In online surveys, anthropomorphic agents could help address issues like careless responding and satisficing, which originate from the lack of personal engagement and perceived accountability. However, there is a lack of understanding of how ECAs in user experience research may affect participant engagement, satisfaction, and the quality of responses. As a proof of concept, we propose an instrument that enables the incorporation of conversations with a virtual avatar into surveys, using on AI-driven video generation, speech recognition, and Large Language Models. In our between-subjects study, 80 participants (UK, stratified random sample of general population) either talked to a voice-based agent with an animated video avatar, or interacted with a chatbot. Across surveys based on two self-reported psychometric tests, 2,265 conversation responses were obtained. Statistical comparison of results indicates that embodied agents can contribute significantly to more informative, detailed responses, as well as higher yet more time-efficient engagement. Furthermore, qualitative analysis provides valuable insights for causes of no significant change to satisfaction, linked to personal preferences, turn-taking delays and Uncanny Valley reactions. These findings support the pursuit and development of new methods toward human-like agents for the transformation of online surveys into more natural interactions resembling in-person interviews.
Authors:Surej Mouli, Ramaswamy Palaniappan
Abstract:
A fully customisable chip-on board (COB) LED design to evoke two brain responses simultaneously (steady state visual evoked potential (SSVEP) and transient evoked potential, P300) is discussed in this paper. Considering different possible modalities in braincomputer interfacing (BCI), SSVEP is widely accepted as it requires a lesser number of electroencephalogram (EEG) electrodes and minimal training time. The aim of this work was to produce a hybrid BCI hardware platform to evoke SSVEP and P300 precisely with reduced fatigue and improved classification performance. The system comprises of four independent radial green visual stimuli controlled individually by a 32-bit microcontroller platform to evoke SSVEP and four red LEDs flashing at random intervals to generate P300 events. The system can also record the P300 event timestamps that can be used in classification, to improve the accuracy and reliability. The hybrid stimulus was tested for realtime classification accuracy by controlling a LEGO robot to move in four directions.
Authors:Martin J. O'Connor, Marcos Martinez-Romero, Attila L. Egyedi, Mete U. Akdogan, Michael V. Dorf, Mark A. Musen
Abstract:
High-quality, "rich" metadata are essential for making research data findable, interoperable, and reusable. The Center for Expanded Data Annotation and Retrieval (CEDAR) has long addressed this need by providing tools to design machine-actionable metadata templates that encode community standards in a computable form. To make these capabilities more accessible within real-world research workflows, we have developed the CEDAR Embeddable Editor (CEE)-a lightweight, interoperable Web Component that brings structured, standards-based metadata authoring directly into third-party platforms. The CEE dynamically renders metadata forms from machine-actionable templates and produces semantically rich metadata in JSON-LD format. It supports ontology-based value selection via the BioPortal ontology repository, and it includes external authority resolution for persistent identifiers such as ORCIDs for individuals and RORs for research organizations. Crucially, the CEE requires no custom user-interface development, allowing deployment across diverse platforms. The CEE has been successfully integrated into generalist scientific data repositories such as Dryad and the Open Science Framework, demonstrating its ability to support discipline-specific metadata creation. By supporting the embedding of metadata authoring within existing research environments, the CEE can facilitate the adoption of community standards and help improve metadata quality across scientific disciplines.
Authors:Kaustav Bhattacharjee, Jun Yuan, Aritra Dasgupta
Abstract:
The recent adoption of artificial intelligence in socio-technical systems raises concerns about the black-box nature of the resulting decisions in fields such as hiring, finance, admissions, etc. If data subjects -- such as job applicants, loan applicants, and students -- receive an unfavorable outcome, they may be interested in algorithmic recourse, which involves updating certain features to yield a more favorable result when re-evaluated by algorithmic decision-making. Unfortunately, when individuals do not fully understand the incremental steps needed to change their circumstances, they risk following misguided paths that can lead to significant, long-term adverse consequences. Existing recourse approaches focus exclusively on the final recourse goal but neglect the possible incremental steps to reach the goal with real-life constraints, user preferences, and model artifacts. To address this gap, we formulate a visual analytic workflow for incremental recourse planning in collaboration with AI/ML experts and contribute an interactive visualization interface that helps data subjects efficiently navigate the recourse alternatives and make an informed decision. We present a usage scenario and subjective feedback from observational studies with twelve graduate students using a real-world dataset, which demonstrates that our approach can be instrumental for data subjects in choosing a suitable recourse path.
Authors:Ryo Miyoshi, Yuki Okafuji, Takuya Iwamoto, Junya Nakanishi, Jun Baba
Abstract:
In recent years, the demand for social robots has grown, requiring them to adapt their behaviors based on users' states. Accurately assessing user experience (UX) in human-robot interaction (HRI) is crucial for achieving this adaptability. UX is a multi-faceted measure encompassing aspects such as sentiment and engagement, yet existing methods often focus on these individually. This study proposes a UX estimation method for HRI by leveraging multimodal social signals. We construct a UX dataset and develop a Transformer-based model that utilizes facial expressions and voice for estimation. Unlike conventional models that rely on momentary observations, our approach captures both short- and long-term interaction patterns using a multi-instance learning framework. This enables the model to capture temporal dynamics in UX, providing a more holistic representation. Experimental results demonstrate that our method outperforms third-party human evaluators in UX estimation.
Authors:Ethan Wilson, Vincent Bindschaedler, Sophie Jörg, Sean Sheikholeslam, Kevin Butler, Eakta Jain
Abstract:
Photorealistic 3D avatar generation has rapidly improved in recent years, and realistic avatars that match a user's true appearance are more feasible in Mixed Reality (MR) than ever before. Yet, there are known risks to sharing one's likeness online, and photorealistic MR avatars could exacerbate these risks. If user likenesses were to be shared broadly, there are risks for cyber abuse or targeted fraud based on user appearances. We propose an alternate avatar rendering scheme for broader social MR -- synthesizing realistic avatars that preserve a user's demographic identity while being distinct enough from the individual user to protect facial biometric information. We introduce a methodology for privatizing appearance by isolating identity within the feature space of identity-encoding generative models. We develop two algorithms that then obfuscate identity: \epsmethod{} provides differential privacy guarantees and \thetamethod{} provides fine-grained control for the level of identity offset. These methods are shown to successfully generate de-identified virtual avatars across multiple generative architectures in 2D and 3D. With these techniques, it is possible to protect user privacy while largely preserving attributes related to sense of self. Employing these techniques in public settings could enable the use of photorealistic avatars broadly in MR, maintaining high realism and immersion without privacy risk.
Authors:Alex Leitch, Celia Chen
Abstract:
As AI art generation becomes increasingly sophisticated, HCI research has focused primarily on questions of detection, authenticity, and automation. This paper argues that such approaches fundamentally misunderstand how artistic value emerges from the concerns that drive human image production. Through examination of historical precedents, we demonstrate that artistic style is not only visual appearance but the resolution of creative struggle, as artists wrestle with influence and technical constraints to develop unique ways of seeing. Current AI systems flatten these human choices into reproducible patterns without preserving their provenance. We propose that HCI's role lies not only in perfecting visual output, but in developing means to document the origins and evolution of artistic style as it appears within generated visual traces. This reframing suggests new technical directions for HCI research in generative AI, focused on automatic documentation of stylistic lineage and creative choice rather than simple reproduction of aesthetic effects.
Authors:Wieslaw KopeÄ, Anna Jaskulska, WÅadysÅaw Fuchs, Wiktor Stawski, StanisÅaw KnapiÅski, Barbara Karpowicz, RafaÅ MasÅyk
Abstract:
Digital technologies and tools have transformed the way we can study cultural heritage and the way we can recreate it digitally. Techniques such as laser scanning, photogrammetry, and a variety of Mixed Reality solutions have enabled researchers to examine cultural objects and artifacts more precisely and from new perspectives. In this part of the panel, we explore how Virtual Reality (VR) and eXtended Reality (XR) can serve as tools to recreate and visualize the remains of historical cultural heritage and experience it in simulations of its original complexity, which means immersive and interactive. Visualization of material culture exemplified by archaeological sites and architecture can be particularly useful when only ruins or archaeological remains survive. However, these advancements also bring significant challenges, especially in the area of transdisciplinary cooperation between specialists from many, often distant, fields, and the dissemination of virtual immersive environments among both professionals and the general public.
Authors:Sebastian Dohnány, Zeb Kurth-Nelson, Eleanor Spens, Lennart Luettgau, Alastair Reid, Iason Gabriel, Christopher Summerfield, Murray Shanahan, Matthew M Nour
Abstract:
Artificial intelligence chatbots have achieved unprecedented adoption, with millions now using these systems for emotional support and companionship in contexts of widespread social isolation and capacity-constrained mental health services. While some users report psychological benefits, concerning edge cases are emerging, including reports of suicide, violence, and delusional thinking linked to perceived emotional relationships with chatbots. To understand this new risk profile we need to consider the interaction between human cognitive and emotional biases, and chatbot behavioural tendencies such as agreeableness (sycophancy) and adaptability (in-context learning). We argue that individuals with mental health conditions face increased risks of chatbot-induced belief destabilization and dependence, owing to altered belief-updating, impaired reality-testing, and social isolation. Current AI safety measures are inadequate to address these interaction-based risks. To address this emerging public health concern, we need coordinated action across clinical practice, AI development, and regulatory frameworks.
Authors:Gioele Giachino, Marco Rondina, Antonio Vetrò, Riccardo Coppola, Juan Carlos De Martin
Abstract:
The increasing use of Large Language Models (LLMs) in a large variety of domains has sparked worries about how easily they can perpetuate stereotypes and contribute to the generation of biased content. With a focus on gender and professional bias, this work examines in which manner LLMs shape responses to ungendered prompts, contributing to biased outputs. This analysis uses a structured experimental method, giving different prompts involving three different professional job combinations, which are also characterized by a hierarchical relationship. This study uses Italian, a language with extensive grammatical gender differences, to highlight potential limitations in current LLMs' ability to generate objective text in non-English languages. Two popular LLM-based chatbots are examined, namely OpenAI ChatGPT (gpt-4o-mini) and Google Gemini (gemini-1.5-flash). Through APIs, we collected a range of 3600 responses. The results highlight how content generated by LLMs can perpetuate stereotypes. For example, Gemini associated 100% (ChatGPT 97%) of 'she' pronouns to the 'assistant' rather than the 'manager'. The presence of bias in AI-generated text can have significant implications in many fields, such as in the workplaces or in job selections, raising ethical concerns about its use. Understanding these risks is pivotal to developing mitigation strategies and assuring that AI-based systems do not increase social inequalities, but rather contribute to more equitable outcomes. Future research directions include expanding the study to additional chatbots or languages, refining prompt engineering methods or further exploiting a larger experimental base.
Authors:Meriem Zerkouk, Miloud Mihoubi, Belkacem Chikhaoui
Abstract:
AI-based Intelligent Tutoring Systems (ITS) have significant potential to transform teaching and learning. As efforts continue to design, develop, and integrate ITS into educational contexts, mixed results about their effectiveness have emerged. This paper provides a comprehensive review to understand how ITS operate in real educational settings and to identify the associated challenges in their application and evaluation. We use a systematic literature review method to analyze numerous qualified studies published from 2010 to 2025, examining domains such as pedagogical strategies, NLP, adaptive learning, student modeling, and domain-specific applications of ITS. The results reveal a complex landscape regarding the effectiveness of ITS, highlighting both advancements and persistent challenges. The study also identifies a need for greater scientific rigor in experimental design and data analysis. Based on these findings, suggestions for future research and practical implications are proposed.
Authors:Xue Wen Tan, Kenneth See, Stanley Kok
Abstract:
The rapid growth of messaging scams creates an escalating challenge for user security and financial safety. In this paper, we present the \textit{Anticipate, Simulate, Reason} (ASR) generative AI framework to enable users to proactively identify and comprehend scams within instant messaging platforms. Using large language models, ASR predicts scammer responses and delivers real-time, interpretable support to end-users. We also develop ScamGPT-J, a domain-specific language model fine-tuned on a new, high-quality dataset of scam conversations covering multiple scam types. Thorough experimental evaluation shows that the ASR framework substantially enhances scam detection, particularly in challenging contexts such as job scams, and uncovers important demographic patterns in user vulnerability and perceptions of AI-generated assistance. Our findings reveal a contradiction where those most at risk are often least receptive to AI support, emphasizing the importance of user-centered design in AI-driven fraud prevention. This work advances both the practical and theoretical foundations for interpretable and human-centered AI systems in combating evolving digital threats.
Authors:Haoran Jiang, Shaohan Shi, Yunjie Yao, Chang Jiang, Quan Li
Abstract:
Modern scientific discovery faces growing challenges in integrating vast and heterogeneous knowledge critical to breakthroughs in biomedicine and drug development. Traditional hypothesis-driven research, though effective, is constrained by human cognitive limits, the complexity of biological systems, and the high cost of trial-and-error experimentation. Deep learning models, especially graph neural networks (GNNs), have accelerated prediction generation, but the sheer volume of outputs makes manual selection for validation unscalable. Large language models (LLMs) offer promise in filtering and hypothesis generation, yet suffer from hallucinations and lack grounding in structured knowledge, limiting their reliability. To address these issues, we propose HypoChainer, a collaborative visualization framework that integrates human expertise, LLM-driven reasoning, and knowledge graphs (KGs) to enhance hypothesis generation and validation. HypoChainer operates in three stages: First, exploration and contextualization -- experts use retrieval-augmented LLMs (RAGs) and dimensionality reduction to navigate large-scale GNN predictions, assisted by interactive explanations. Second, hypothesis chain formation -- experts iteratively examine KG relationships around predictions and semantically linked entities, refining hypotheses with LLM and KG suggestions. Third, validation prioritization -- refined hypotheses are filtered based on KG-supported evidence to identify high-priority candidates for experimentation, with visual analytics further strengthening weak links in reasoning. We demonstrate HypoChainer's effectiveness through case studies in two domains and expert interviews, highlighting its potential to support interpretable, scalable, and knowledge-grounded scientific discovery.
Authors:Kayhan Latifzadeh, Luis A. Leiva, Klen ÄopiÄ Pucihar, Matjaž Kljun, Iztok Devetak, Lili Steblovnik
Abstract:
We examined eye and head movements to gain insights into skill development in clinical settings. A total of 24 practitioners participated in simulated baby delivery training sessions. We calculated key metrics, including pupillary response rate, fixation duration, or angular velocity. Our findings indicate that eye and head tracking can effectively differentiate between trained and untrained practitioners, particularly during labor tasks. For example, head-related features achieved an F1 score of 0.85 and AUC of 0.86, whereas pupil-related features achieved F1 score of 0.77 and AUC of 0.85. The results lay the groundwork for computational models that support implicit skill assessment and training in clinical settings by using commodity eye-tracking glasses as a complementary device to more traditional evaluation methods such as subjective scores.
Authors:Gennie Mansi, Mark Riedl
Abstract:
Physicians are--and feel--ethically, professionally, and legally responsible for patient outcomes, buffering patients from harmful AI determinations from medical AI systems. Many have called for explainable AI (XAI) systems to help physicians incorporate medical AI recommendations into their workflows in a way that reduces the potential of harms to patients. While prior work has demonstrated how physicians' legal concerns impact their medical decision making, little work has explored how XAI systems should be designed in light of these concerns. In this study, we conducted interviews with 10 physicians to understand where and how they anticipate errors that may occur with a medical AI system and how these anticipated errors connect to their legal concerns. In our study, physicians anticipated risks associated with using an AI system for patient care, but voiced unknowns around how their legal risk mitigation strategies may change given a new technical system. Based on these findings, we describe the implications for designing XAI systems that can address physicians' legal concerns. Specifically, we identify the need to provide AI recommendations alongside contextual information that guides their risk mitigation strategies, including how non-legally related aspects of their systems, such as medical documentation and auditing requests, might be incorporated into a legal case.
Authors:Gennie Mansi, Mark Riedl
Abstract:
Many calls for explainable AI (XAI) systems in medicine are tied to a desire for AI accountability--accounting for, mitigating, and ultimately preventing harms from AI systems. Because XAI systems provide human-understandable explanations for their output, they are often viewed as a primary path to prevent harms to patients. However, when harm occurs, laws, policies, and regulations also shape AI accountability by impacting how harmed individuals can obtain recourse. Current approaches to XAI explore physicians' medical and relational needs to counter harms to patients, but there is a need to understand how XAI systems should account for the legal considerations of those impacted. We conduct an analysis of 31 legal cases and reported harms to identify patterns around how AI systems impact patient care. Our findings reflect how patients' medical care relies on a complex web of stakeholders--physicians, state health departments, health insurers, care facilities, among others--and many AI systems deployed across their healthcare delivery negatively impact their care. In response, patients have had no option but to seek legal recourse for harms. We shift the frame from physician-centered to patient-centered accountability approaches by describing how lawyers and technologists need to recognize and address where AI harms happen. We present paths for preventing or countering harm (1) by changing liability structures to reflect the role of many stakeholders in shaping how AI systems impact patient care; and (2) by designing XAI systems that can help advocates, such as legal representatives, who provide critical legal expertise and practically support recourse for patients.
Authors:Zhipeng Li, Yi-Chi Liao, Christian Holz
Abstract:
Adjusting visual parameters such as brightness and contrast is common in our everyday experiences. Finding the optimal parameter setting is challenging due to the large search space and the lack of an explicit objective function, leaving users to rely solely on their implicit preferences. Prior work has explored Preferential Bayesian Optimization (PBO) to address this challenge, involving users to iteratively select preferred designs from candidate sets. However, PBO often requires many rounds of preference comparisons, making it more suitable for designers than everyday end-users. We propose Meta-PO, a novel method that integrates PBO with meta-learning to improve sample efficiency. Specifically, Meta-PO infers prior users' preferences and stores them as models, which are leveraged to intelligently suggest design candidates for the new users, enabling faster convergence and more personalized results. An experimental evaluation of our method for appearance design tasks on 2D and 3D content showed that participants achieved satisfactory appearance in 5.86 iterations using Meta-PO when participants shared similar goals with a population (e.g., tuning for a ``warm'' look) and in 8 iterations even generalizes across divergent goals (e.g., from ``vintage'', ``warm'', to ``holiday''). Meta-PO makes personalized visual optimization more applicable to end-users through a generalizable, more efficient optimization conditioned on preferences, with the potential to scale interface personalization more broadly.
Authors:Chowdhury Shahriar Muzammel, Maria Spichkova, James Harland
Abstract:
In requirements engineering (RE), personas are now being used to represent user expectations and needs. This systematic mapping study (SMS) aims to explore the most recent studies and to cover recent changes in trends, especially related to the recent evolution of Generative AI approaches. Our SMS covers the period between April 2023 and April 2025. We identified 22 relevant publications and analysed persona representation, construction, validation, as well as RE activities covered by personas. We identified that a number of studies applied AI-based solutions for persona construction and validation. We observed that template-based personas are becoming more popular nowadays. We also observed an increase in the proportion of studies covering validation aspects.
Authors:Chowdhury Shahriar Muzammel, Maria Spichkova, James Harland
Abstract:
Requirements Engineering (RE) is one of the most interaction-intensive phases of software development. This means that RE activities might be especially impacted by stakeholders' national culture. Software development projects increasingly have a very diverse range of stakeholders. To future-proof RE activities, we need to help RE practitioners avoid misunderstandings and conflicts that might arise from not understanding potential Cultural Influences (CIs). Moreover, an awareness of CIs supports diversity and inclusion in the IT profession. Bangladesh has a growing IT sector with some unique socio-cultural characteristics, and has been largely overlooked in this research field. In this study, we aim to investigate how the RE process is adopted in the context of Bangladeshi culture and what cultural influences impact overall RE activities.
Authors:Taufiq Daryanto, Sophia Stil, Xiaohan Ding, Daniel Manesh, Sang Won Lee, Tim Lee, Stephanie Lunn, Sarah Rodriguez, Chris Brown, Eugenia Rho
Abstract:
One challenge in technical interviews is the think-aloud process, where candidates verbalize their thought processes while solving coding tasks. Despite its importance, opportunities for structured practice remain limited. Conversational AI offers potential assistance, but limited research explores user perceptions of its role in think-aloud practice. To address this gap, we conducted a study with 17 participants using an LLM-based technical interview practice tool. Participants valued AI's role in simulation, feedback, and learning from generated examples. Key design recommendations include promoting social presence in conversational AI for technical interview simulation, providing feedback beyond verbal content analysis, and enabling crowdsourced think-aloud examples through human-AI collaboration. Beyond feature design, we examined broader considerations, including intersectional challenges and potential strategies to address them, how AI-driven interview preparation could promote equitable learning in computing careers, and the need to rethink AI's role in interview practice by suggesting a research direction that integrates human-AI collaboration.
Authors:Rui Sheng, Chuhan Shi, Sobhan Lotfi, Shiyi Liu, Adam Perer, Huamin Qu, Furui Cheng
Abstract:
Human-AI interfaces play a crucial role in advancing practices and research within the healthcare domain. However, designing such interfaces presents a substantial challenge for designers. In this paper, we propose systematic guidance for designing human-AI interfaces in typical healthcare scenarios by summarizing the design patterns for presenting and interacting with common information entities. To deepen our understanding of these 12 design patterns, we interviewed 12 healthcare professionals to explore potential usage scenarios and important considerations. Furthermore, we conducted workshops with 14 participants recruited online to evaluate our design patterns. Finally, we discussed the generalizability of the design patterns to other application domains, the limitations, and the future work.
Authors:Daniel Platnick, Matti Gruener, Marjan Alirezaie, Kent Larson, Dava J. Newman, Hossein Rahnama
Abstract:
AI-enhanced Extended Reality (XR) aims to deliver adaptive, immersive experiences-yet current systems fall short due to shallow user modeling and limited cognitive context. We introduce Perspective-Aware AI in Extended Reality (PAiR), a foundational framework for integrating Perspective-Aware AI (PAi) with XR to enable interpretable, context-aware experiences grounded in user identity. PAi is built on Chronicles: reasoning-ready identity models learned from multimodal digital footprints that capture users' cognitive and experiential evolution. PAiR employs these models in a closed-loop system linking dynamic user states with immersive environments. We present PAiR's architecture, detailing its modules and system flow, and demonstrate its utility through two proof-of-concept scenarios implemented in the Unity-based OpenDome engine. PAiR opens a new direction for human-AI interaction by embedding perspective-based identity models into immersive systems.
Authors:Vimaleswar A, Prabhu Nandan Sahu, Nilesh Kumar Sahu, Haroon R Lone
Abstract:
Mental health plays a crucial role in the overall well-being of an individual. In recent years, digital platforms have been increasingly used to expand mental health and emotional support. However, there are persistent challenges related to limited user accessibility, internet connectivity, and data privacy, which highlight the need for an offline, smartphone-based solution. To address these challenges, we propose EmoSApp (Emotional Support App): an entirely offline, smartphone-based conversational app designed for mental health and emotional support. The system leverages Large Language Models (LLMs), specifically fine-tuned, quantized and deployed using Torchtune and Executorch for resource-constrained devices, allowing all inferences to occur on the smartphone. To equip EmoSApp with robust domain expertise, we fine-tuned the LLaMA-3.2-1B-Instruct model on our custom curated ``Knowledge dataset'' of 14,582 mental-health QA pairs, along with the multi-turn conversational data.
Through qualitative human evaluation with the student population, we demonstrate that EmoSApp has the ability to respond coherently, empathetically, maintain interactive dialogue, and provide relevant suggestions to user's mental health problems. Additionally, quantitative evaluations on nine standard commonsense and reasoning benchmarks demonstrate the efficacy of our fine-tuned, quantized model in low-resource settings. By prioritizing on-device deployment and specialized domain adaptation, EmoSApp serves as a blueprint for future innovations in portable, secure, and highly tailored AI-driven mental health solutions.
Authors:Shaohan Shi, Yuheng Shao, Haoran Jiang, Yunjie Yao, Zhijun Zhang, Xu Ding, Quan Li
Abstract:
Medical images often contain multiple labels with imbalanced distributions and co-occurrence, leading to bias in multi-label medical image classification. Close collaboration between medical professionals and machine learning practitioners has significantly advanced medical image analysis. However, traditional collaboration modes struggle to facilitate effective feedback between physicians and AI models, as integrating medical expertise into the training process via engineers can be time-consuming and labor-intensive. To bridge this gap, we introduce MEDebiaser, an interactive system enabling physicians to directly refine AI models using local explanations. By combining prediction with attention loss functions and employing a customized ranking strategy to alleviate scalability, MEDebiaser allows physicians to mitigate biases without technical expertise, reducing reliance on engineers, and thus enhancing more direct human-AI feedback. Our mechanism and user studies demonstrate that it effectively reduces biases, improves usability, and enhances collaboration efficiency, providing a practical solution for integrating medical expertise into AI-driven healthcare.
Authors:Shuchang Xu, Xiaofu Jin, Wenshuo Zhang, Huamin Qu, Yukang Yan
Abstract:
360° videos enable users to freely choose their viewing paths, but blind and low vision (BLV) users are often excluded from this interactive experience. To bridge this gap, we present Branch Explorer, a system that transforms 360° videos into branching narratives -- stories that dynamically unfold based on viewer choices -- to support interactive viewing for BLV audiences. Our formative study identified three key considerations for accessible branching narratives: providing diverse branch options, ensuring coherent story progression, and enabling immersive navigation among branches. To address these needs, Branch Explorer employs a multi-modal machine learning pipeline to generate diverse narrative paths, allowing users to flexibly make choices at detected branching points and seamlessly engage with each storyline through immersive audio guidance. Evaluation with 12 BLV viewers showed that Branch Explorer significantly enhanced user agency and engagement in 360° video viewing. Users also developed personalized strategies for exploring 360° content. We further highlight implications for supporting accessible exploration of videos and virtual environments.
Authors:Md. Saif Hassan Onim, Andrew M. Kiselica, Himanshu Thapliyal
Abstract:
Emotion detection in older adults is crucial for understanding their cognitive and emotional well-being, especially in hospital and assisted living environments. In this work, we investigate an edge-based, non-obtrusive approach to emotion identification that uses only physiological signals obtained via wearable sensors. Our dataset includes data from 40 older individuals. Emotional states were obtained using physiological signals from the Empatica E4 and Shimmer3 GSR+ wristband and facial expressions were recorded using camera-based emotion recognition with the iMotion's Facial Expression Analysis (FEA) module. The dataset also contains twelve emotion categories in terms of relative intensities. We aim to study how well emotion recognition can be accomplished using simply physiological sensor data, without the requirement for cameras or intrusive facial analysis. By leveraging classical machine learning models, we predict the intensity of emotional responses based on physiological signals. We achieved the highest 0.782 r2 score with the lowest 0.0006 MSE on the regression task. This method has significant implications for individuals with Alzheimer's Disease and Related Dementia (ADRD), as well as veterans coping with Post-Traumatic Stress Disorder (PTSD) or other cognitive impairments. Our results across multiple classical regression models validate the feasibility of this method, paving the way for privacy-preserving and efficient emotion recognition systems in real-world settings.
Authors:Yunyi Li, Maria De-Arteaga, Maytal Saar-Tsechansky
Abstract:
Reliable data is a cornerstone of modern organizational systems. A notable data integrity challenge stems from label bias, which refers to systematic errors in a label, a covariate that is central to a quantitative analysis, such that its quality differs across social groups. This type of bias has been conceptually and empirically explored and is widely recognized as a pressing issue across critical domains. However, effective methodologies for addressing it remain scarce. In this work, we propose Decoupled Confident Learning (DeCoLe), a principled machine learning based framework specifically designed to detect mislabeled instances in datasets affected by label bias, enabling bias aware mislabelling detection and facilitating data quality improvement. We theoretically justify the effectiveness of DeCoLe and evaluate its performance in the impactful context of hate speech detection, a domain where label bias is a well documented challenge. Empirical results demonstrate that DeCoLe excels at bias aware mislabeling detection, consistently outperforming alternative approaches for label error detection. Our work identifies and addresses the challenge of bias aware mislabeling detection and offers guidance on how DeCoLe can be integrated into organizational data management practices as a powerful tool to enhance data reliability.
Authors:Akshat Choube, Ha Le, Jiachen Li, Kaixin Ji, Vedant Das Swain, Varun Mishra
Abstract:
The ubiquitous presence of smartphones and wearables has enabled researchers to build prediction and detection models for various health and behavior outcomes using passive sensing data from these devices. Achieving a high-level, holistic understanding of an individual's behavior and context, however, remains a significant challenge. Due to the nature of passive sensing data, sensemaking -- the process of interpreting and extracting insights -- requires both domain knowledge and technical expertise, creating barriers for different stakeholders. Existing systems designed to support sensemaking are either not open-ended or cannot perform complex data triangulation. In this paper, we present a novel sensemaking system, Group of LLMs for Open-ended Sensemaking (GLOSS), capable of open-ended sensemaking and performing complex multimodal triangulation to derive insights. We demonstrate that GLOSS significantly outperforms the commonly used Retrieval-Augmented Generation (RAG) technique, achieving 87.93% accuracy and 66.19% consistency, compared to RAG's 29.31% accuracy and 52.85% consistency. Furthermore, we showcase the promise of GLOSS through four use cases inspired by prior and ongoing work in the UbiComp and HCI communities. Finally, we discuss the potential of GLOSS, its broader implications, and the limitations of our work.
Authors:Md Dilshadur Rahman, Md Rahat-uz- Zaman, Andrew McNutt, Paul Rosen
Abstract:
Annotations are central to effective data communication, yet most visualization tools treat them as secondary constructs -- manually defined, difficult to reuse, and loosely coupled to the underlying visualization grammar. We propose a declarative extension to Wilkinson's Grammar of Graphics that reifies annotations as first-class design elements, enabling structured specification of annotation targets, types, and positioning strategies. To demonstrate the utility of our approach, we develop a prototype extension called Vega-Lite Annotation. Through comparison with eight existing tools, we show that our approach enhances expressiveness, reduces authoring effort, and enables portable, semantically integrated annotation workflows.
Authors:Karine Karine, Benjamin M. Marlin
Abstract:
The use of reinforcement learning (RL) methods to support health behavior change via personalized and just-in-time adaptive interventions is of significant interest to health and behavioral science researchers focused on problems such as smoking cessation support and physical activity promotion. However, RL methods are often applied to these domains using a small collection of context variables to mitigate the significant data scarcity issues that arise from practical limitations on the design of adaptive intervention trials. In this paper, we explore an approach to significantly expanding the state space of an adaptive intervention without impacting data efficiency. The proposed approach enables intervention participants to provide natural language descriptions of aspects of their current state. It then leverages inference with pre-trained large language models (LLMs) to better align the policy of a base RL method with these state descriptions. To evaluate our method, we develop a novel physical activity intervention simulation environment that generates text-based state descriptions conditioned on latent state variables using an auxiliary LLM. We show that this approach has the potential to significantly improve the performance of online policy learning methods.
Authors:Haodong Zhang, Hongqi Li
Abstract:
Electroencephalography (EEG) is one of the most common signals used to capture the electrical activity of the brain, and the decoding of EEG, to acquire the user intents, has been at the forefront of brain-computer/machine interfaces (BCIs/BMIs) research. Compared to traditional EEG analysis methods with machine learning, the advent of deep learning approaches have gradually revolutionized the field by providing an end-to-end long-cascaded architecture, which can learn more discriminative features automatically. Among these, Transformer is renowned for its strong handling capability of sequential data by the attention mechanism, and the application of Transformers in various EEG processing tasks is increasingly prevalent. This article delves into a relevant survey, summarizing the latest application of Transformer models in EEG decoding since it appeared. The evolution of the model architecture is followed to sort and organize the related advances, in which we first elucidate the fundamentals of the Transformer that benefits EEG decoding and its direct application. Then, the common hybrid architectures by integrating basic Transformer with other deep learning techniques (convolutional/recurrent/graph/spiking neural netwo-rks, generative adversarial networks, diffusion models, etc.) is overviewed in detail. The research advances of applying the modified intrinsic structures of customized Transformer have also been introduced. Finally, the current challenges and future development prospects in this rapidly evolving field are discussed. This paper aims to help readers gain a clear understanding of the current state of Transformer applications in EEG decoding and to provide valuable insights for future research endeavors.
Authors:Pengkun Liu, Jackson Greene, Jiali Huang, Pingbo Tang, Yu Hou
Abstract:
Researchers have been using simulation-based methods for drone-assisted inspection training. Multiple brain regions are associated with information processes and decision-making, and the connectivity of these regions may further influence inspectors' performance. However, researchers do not understand the pathways of the information flows when drone pilots process the maintenance and manipulation of information, which may affect the efficiency of tacit knowledge transfer. This study aims to reveal the causal connection between participants' brain regions using an electroencephalogram and dynamic causal modeling when processing drone-assisted building energy audit tasks using different display modalities. The results showed similar single-direction connectivity patterns for the different simulation groups. The results also showed similar patterns between brain regions related to visual inspection performance before and after training. These findings highlight the nature of brain asymmetries and may be utilized in measuring cognitive states and designing adaptive automation in the knowledge transfer of drone-based inspection.
Authors:Vishakha Lall, Yisi Liu
Abstract:
Traditional simulator-based training for maritime professionals is critical for ensuring safety at sea but often depends on subjective trainer assessments of technical skills, behavioral focus, communication, and body language, posing challenges such as subjectivity, difficulty in measuring key features, and cognitive limitations. Addressing these issues, this study develops an AI-driven framework to enhance maritime training by objectively assessing trainee performance through visual focus tracking, speech recognition, and stress detection, improving readiness for high-risk scenarios. The system integrates AI techniques, including visual focus determination using eye tracking, pupil dilation analysis, and computer vision; communication analysis through a maritime-specific speech-to-text model and natural language processing; communication correctness using large language models; and mental stress detection via vocal pitch. Models were evaluated on data from simulated maritime scenarios with seafarers exposed to controlled high-stress events. The AI algorithms achieved high accuracy, with ~92% for visual detection, ~91% for maritime speech recognition, and ~90% for stress detection, surpassing existing benchmarks. The system provides insights into visual attention, adherence to communication checklists, and stress levels under demanding conditions. This study demonstrates how AI can transform maritime training by delivering objective performance analytics, enabling personalized feedback, and improving preparedness for real-world operational challenges.
Authors:Na Lee, Konstantinos Barmpas, Yannis Panagakis, Dimitrios Adamos, Nikolaos Laskaris, Stefanos Zafeiriou
Abstract:
Foundation Models have demonstrated significant success across various domains in Artificial Intelligence (AI), yet their capabilities for brainwave modeling remain unclear. In this paper, we comprehensively evaluate current Large Brainwave Foundation Models (LBMs) through systematic fine-tuning experiments across multiple Brain-Computer Interface (BCI) benchmark tasks, including memory tasks and sleep stage classification. Our extensive analysis shows that state-of-the-art LBMs achieve only marginal improvements (0.9%-1.2%) over traditional deep architectures while requiring significantly more parameters (millions vs thousands), raising important questions about their efficiency and applicability in BCI contexts. Moreover, through detailed ablation studies and Low-Rank Adaptation (LoRA), we significantly reduce trainable parameters without performance degradation, while demonstrating that architectural and training inefficiencies limit LBMs' current capabilities. Our experiments span both full model fine-tuning and parameter-efficient adaptation techniques, providing insights into optimal training strategies for BCI applications. We pioneer the application of LoRA to LBMs, revealing that performance benefits generally emerge when adapting multiple neural network components simultaneously. These findings highlight the critical need for domain-specific development strategies to advance LBMs, suggesting that current architectures may require redesign to fully leverage the potential of foundation models in brainwave analysis.
Authors:Sifatul Anindho, Videep Venkatesha, Nathaniel Blanchard
Abstract:
Identification of affective and attentional states of individuals within groups is difficult to obtain without disrupting the natural flow of collaboration. Recent work from our group used a retrospect cued recall paradigm where participants spoke about their cognitive-affective states while they viewed videos of their groups. We then collected additional participants where their reports were constrained to a subset of pre-identified cognitive-affective states. In this latter case, participants either self reported or reported in response to probes. Here, we present an initial analysis of the frequency and temporal distribution of participant reports, and how the distributions of labels changed across the two collections. Our approach has implications for the educational data mining community in tracking cognitive-affective states in collaborative learning more effectively and in developing improved adaptive learning systems that can detect and respond to cognitive-affective states.
Authors:Mayar Elfares, Pascal Reisert, Ralf Küsters, Andreas Bulling
Abstract:
Privacy is a highly subjective concept and perceived variably by different individuals. Previous research on quantifying user-perceived privacy has primarily relied on questionnaires. Furthermore, applying user-perceived privacy to optimise the parameters of privacy-preserving techniques (PPT) remains insufficiently explored. To address these limitations, we introduce Gaze3P -- the first dataset specifically designed to facilitate systematic investigations into user-perceived privacy. Our dataset comprises gaze data from 100 participants and 1,000 stimuli, encompassing a range of private and safe attributes. With Gaze3P, we train a machine learning model to implicitly and dynamically predict perceived privacy from human eye gaze. Through comprehensive experiments, we show that the resulting models achieve high accuracy. Finally, we illustrate how predicted privacy can be used to optimise the parameters of differentially private mechanisms, thereby enhancing their alignment with user expectations.
Authors:Rafael A. Calvo, Dorian Peters
Abstract:
Conversational AI (CAI) systems offer opportunities to scale service provision to unprecedented levels and governments and corporations are already beginning to deploy them across services. The economic argument is similar across domains: use CAI to automate the time-consuming conversations required for customer, client or patient support. Herein we draw on our work in dementia care to explore some of the challenges and opportunities for CAI, and how a new way of conceptualising these systems could help ensure essential aspects for human thriving are not lost in the process of automation.
Authors:Lucy Jiang, Lotus Zhang, Leah Findlater
Abstract:
Many blind and low vision (BLV) people are excluded from professional roles that may involve visual tasks due to access barriers and persisting stigmas. Advancing generative AI systems can support BLV people through providing contextual and personalized visual descriptions for creation, critique, and consumption. In this workshop paper, we provide design suggestions for how visual descriptions can be better contextualized for multiple professional tasks. We conclude by discussing how these designs can improve autonomy, inclusion, and skill development over time.
Authors:Yaning Li, Yutong Chen, Yihan Hou, Chenyi Chen, Yihan Han, Jingxuan Han, Wenxi Dai, Youyou Li, Xinke Tang, Meng Li, Qi Dong, Hongwei Li
Abstract:
Lacquerware, a representative craft of Chinese intangible cultural heritage, is renowned for its layered aesthetics and durability but faces declining engagement. While prior human-computer interaction research has explored embedding interactive circuits to transform lacquerware into responsive artifacts, most studies have focused on fabrication techniques rather than supporting makers in creatively designing such interactions at a low threshold. To address this gap, we present LacAIDes, a Generative AI powered creativity-support tool built on a multi-agent workflow aligned with the double diamond model of design thinking. LacAIDes enables exploration and creation of culturally grounded interactive circuits without requiring prior technical expertise. We evaluated LacAIDes in a longitudinal workshop with 34 participants using a mixed-method approach. Results show that LacAIDes demonstrated high usability, enhanced creative engagement in craft making, and encouraged critical reflection on the role of Generative AI in digital craft practices. This work contributes to human-computer interaction by introducing a novel creativity-support tool and providing empirical insights into revitalizing traditional craft making through Generative AI.
Authors:Mariana Fernandez-Espinosa, Kai Zhang, Jad Bendarkawi, Ashley Ponce, Sean Chidozie Mata, Aminah Aliu, Lei Zhang, Francisco Fernandez Medina, Elena Mangione-Lora, Andres Monroy-Hernandez, Diego Gomez-Zara
Abstract:
Developing speaking proficiency in a second language can be cognitively demanding and emotionally taxing, often triggering fear of making mistakes or being excluded from larger groups. While current learning tools show promise for speaking practice, most focus on dyadic, scripted scenarios, limiting opportunities for dynamic group interactions. To address this gap, we present ConversAR, a Mixed Reality system that leverages Generative AI and XR to support situated and personalized group conversations. It integrates embodied AI agents, scene recognition, and generative 3D props anchored to real-world surroundings. Based on a formative study with experts in language acquisition, we developed and tested this system with a user study with 21 second-language learners. Results indicate that the system enhanced learner engagement, increased willingness to communicate, and offered a safe space for speaking. We discuss the implications for integrating Generative AI and XR into the design of future language learning applications.
Authors:Rachel L. Franz, Jacob O. Wobbrock
Abstract:
Today's virtual reality (VR) systems and environments assume that users have typical abilities, which can make VR inaccessible to people with physical impairments. However, there is not yet an understanding of how inaccessible locomotion techniques are, and which interactions make them inaccessible. To this end, we conducted a study in which people with and without upper-body impairments navigated a virtual environment with six locomotion techniques to quantify performance differences among groups. We found that groups performed similarly with Sliding Looking on all performance measures, suggesting that this might be a good default locomotion technique for VR apps. To understand the nature of performance differences with the other techniques, we collected low-level interaction data from the controllers and headset and analyzed interaction differences with a set of movement-, button-, and target-related metrics. We found that movement-related metrics from headset data reveal differences among groups with all techniques, suggesting these are good metrics for identifying whether a user has an upper-body impairment. We also identify movement-, button, and target-related metrics that can explain performance differences between groups for particular locomotion techniques.
Authors:Yaning Li, Ke Zhao, Shucheng Zheng, Xingyu Chen, Chenyi Chen, Wenxi Dai, Weile Jiang, Qi Dong, Yiqing Zhao, Meng Li, Lin-Ping Yuan
Abstract:
Lost architectural heritage presents interpretive challenges due to vanished structures and fragmented historical records. Using Hanyuan Hall of the Tang dynasty's Daming Palace as a case study, we conducted a formative investigation with archaeologists, heritage administrators, and visitors to identify key issues in current interpretation practices. We found that these practices often compress complex cultural layers into factual summaries and rely on linear narratives that overlook the continuing reinterpretations following a site's disappearance. In response, we designed Pre/Absence, a virtual reality experience grounded in the presence-absence dialectic to interweave tangible and vanished aspects of heritage within a spatiotemporal narrative. A mixed-method study with 28 participants compared Pre/Absence to a paper-based experience. Both improved users' factual understanding, but the VR experience more strongly enhanced cultural awareness, evoked emotional engagement with loss, and encouraged critical reflection on the evolving social and political meanings of heritage. The findings suggest that VR can move beyond static reconstruction to engage users as co-constructors of cultural meaning, providing a nuanced framework for critical heritage narrative design in human-computer interaction.
Authors:Nicola Rossberg, Bennett Kleinberg, Barry O'Sullivan, Luca Longo, Andrea Visentin
Abstract:
As artificial intelligence becomes increasingly pervasive and powerful, the ability to audit AI-based systems is becoming increasingly important. However, explainability for artificial intelligence systems is not a one-size-fits-all solution; different target audiences have varying requirements and expectations for explanations. While various approaches to explainability have been proposed, most explainable artificial intelligence (XAI) methods for tabular data focus on explaining the outputs of supervised machine learning models using the input features. However, a user's ability to understand an explanation depends on their understanding of such features. Therefore, it is in the best interest of the system designer to try to pre-select understandable features for producing a global explanation of an ML model. Unfortunately, no measure currently exists to assess the degree to which a user understands a given input feature. This work introduces psychometrically validated scales that quantitatively seek to assess users' understanding of tabular input features for supervised classification problems. In detail, these scales, one for numerical and one for categorical data, each with two factors and comprising 8 and 9 items, aim to assign a score to each input feature, effectively producing a rank, and allowing for the quantification of feature prioritisation. A confirmatory factor analysis demonstrates a strong relationship between such items and a good fit of the two-factor structure for each scale. This research presents a novel method for assessing understanding and outlines potential applications in the domain of explainable artificial intelligence.
Authors:Yajing Wang, Talayeh Aledavood, Juhi Kulshrestha
Abstract:
Loneliness has reached epidemic proportions globally, posing serious risks to mental and physical health. As social media platforms increasingly mediate social interaction, understanding their relationship with loneliness has become urgent. While survey-based research has examined social media use and loneliness, findings remain mixed, and little is known about when and how often people engage with social media, or about whether different types of platforms are differently associated with loneliness. Web trace data now enable objective examination of these behavioral dimensions. We asked whether objectively measured patterns of social media engagement differ between lonely and non-lonely individuals across devices and platform types. Analyzing six months of web trace data combined with repeated surveys ($N=589$ mobile users; $N=851$ desktop users), we found that greater social media use was associated with higher loneliness across both devices, with this relationship specific to social media rather than other online activities. On desktop, lonely individuals exhibited shorter sessions but more frequent daily engagement. Lonely individuals spent more time on visual-sharing ($g = -0.47$), messaging ($g = -0.36$), and networking-oriented platforms on mobile. These findings demonstrate how longitudinal web trace data can reveal behavioral patterns associated with loneliness, and more broadly illustrate the potential of digital traces for studying other psychological states. Beyond research, the results inform the responsible design of digital interventions and platform features that better support psychological well-being across different technological contexts.
Authors:Ziming Wang, Yiqian Wu, Qingxiao Zheng, Shihan Zhang, Ned Barker, Morten Fjeld
Abstract:
What if future dining involved eating robots? We explore this question through a playful and poetic experiential dinner theater: a tangible design fiction staged as a 2052 Paris restaurant where diners consume a biohybrid flying robot in place of the banned delicacy of ortolan bunting. Moving beyond textual or visual speculation, our ``dinner-in-the-drama'' combined performance, ritual, and multisensory immersion to provoke reflection on sustainability, ethics, and cultural identity. Six participants from creative industries engaged as diners and role-players, responding with curiosity, discomfort, and philosophical debate. They imagined biohybrids as both plausible and unsettling -- raising questions of sentience, symbolism, and technology adoption that exceed conventional sustainability framings of synthetic meat. Our contributions to HCI are threefold: (i) a speculative artifact that stages robots as food, (ii) empirical insights into how publics negotiate cultural and ethical boundaries in post-natural eating, and (iii) a methodological advance in embodied, multisensory design fiction.
Authors:Ziming Wang, Shiwei Yang, Rebecca Currano, Morten Fjeld, David Sirkin
Abstract:
AI-powered road surveillance systems are increasingly proposed to monitor infractions such as speeding, phone use, and jaywalking. While these systems promise to enhance safety by discouraging dangerous behaviors, they also raise concerns about privacy, fairness, and potential misuse of personal data. Yet empirical research on how people perceive AI-enhanced monitoring of public spaces remains limited. We conducted an online survey ($N=720$) using a 3$\times$3 factorial design to examine perceptions of three road surveillance modes -- conventional, AI-enhanced, and AI-enhanced with public shaming -- across China, Europe, and the United States. We measured perceived capability, risk, transparency, and acceptance. Results show that conventional surveillance was most preferred, while public shaming was least preferred across all regions. Chinese respondents, however, expressed significantly higher acceptance of AI-enhanced modes than Europeans or Americans. Our findings highlight the need to account for context, culture, and social norms when considering AI-enhanced monitoring, as these shape trust, comfort, and overall acceptance.
Authors:Suchismita Naik, Austin L. Toombs, Amanda Snellinger, Scott Saponas, Amanda K. Hall
Abstract:
With recent advancements in multi-agent generative AI (Gen AI), technology organizations like Microsoft are adopting these complex tools, redefining AI agents as active collaborators in complex workflows rather than as passive tools. In this study, we investigated how early adopters and developers conceptualize multi-agent Gen AI tools, focusing on how they understand human-AI collaboration mechanisms, general collaboration dynamics, and transparency in the context of AI tools. We conducted semi-structured interviews with 13 developers, all early adopters of multi-agent Gen AI technology who work at Microsoft. Our findings revealed that these early adopters conceptualize multi-agent systems as "teams" of specialized role-based and task-based agents, such as assistants or reviewers, structured similar to human collaboration models and ranging from AI-dominant to AI-assisted, user-controlled interactions. We identified key challenges, including error propagation, unpredictable and unproductive agent loop behavior, and the need for clear communication to mitigate the layered transparency issues. Early adopters' perspectives about the role of transparency underscored its importance as a way to build trust, verify and trace errors, and prevent misuse, errors, and leaks. The insights and design considerations we present contribute to CSCW research about collaborative mechanisms with capabilities ranging from AI-dominant to AI-assisted interactions, transparency and oversight strategies in human-agent and agent-agent interactions, and how humans make sense of these multi-agent systems as dynamic, role-diverse collaborators which are customizable for diverse needs and workflows. We conclude with future research directions that extend CSCW approaches to the design of inter-agent and human mediation interactions.
Authors:Rachel L. Franz, Jacob O. Wobbrock
Abstract:
There are over a hundred virtual reality (VR) locomotion techniques that exist today, with new ones being designed as VR technology evolves. The different ways of controlling locomotion techniques (e.g., gestures, button inputs, body movements), along with the diversity of upper-body motor impairments, can make it difficult for a user to know which locomotion technique is best suited to their particular abilities. Moreover, trial-and-error can be difficult, time-consuming, and costly. Using machine learning techniques and data from 20 people with and without upper-body motor impairments, we developed a modeling approach to predict a ranked list of a user's fastest techniques based on questionnaire and interaction data. We found that a user's fastest technique could be predicted based on interaction data with 92% accuracy and that predicted locomotion times were within 12% of observed times. The model we trained could also rank six locomotion techniques based on speed with 61% accuracy and that predictions were within 8% of observed times. Our findings contribute to growing research in VR accessibility by taking an ability-based design approach to adapt systems to users' abilities.
Authors:Reza Habibi, Seung Wan Ha, Zhiyu Lin, Atieh Kashani, Ala Shafia, Lakshana Lakshmanarajan, Chia-Fang Chung, Magy Seif El-Nasr
Abstract:
Meaningful human-AI collaboration requires more than processing language; it demands a deeper understanding of symbols and their socially constructed meanings. While humans naturally interpret symbols through social interaction, AI systems often miss the dynamic interpretations that emerge in conversation. Drawing on Symbolic Interactionism theory, we conducted two studies to investigate how humans and AI co-construct symbols and their meanings. Findings provide empirical insights into how humans and conversational AI agents collaboratively shape meanings during interaction. We show how participants shift their initial definitions of meaning in response to the symbols and interpretations suggested by the conversational AI agents, especially when social context is introduced. We also observe how participants project their personal and social values into these interactions, refining meanings over time. These findings reveal that shared understanding does not emerge from mere agreement but from the bi-directional exchange and reinterpretation of symbols, suggesting new paradigms for human-AI interaction design.
Authors:Jieyu Zhou, Aryan Roy, Sneh Gupta, Daniel Weitekamp, Christopher J. MacLellan
Abstract:
Existing AI agents typically execute multi-step tasks autonomously and only allow user confirmation at the end. During execution, users have little control, making the confirm-at-end approach brittle: a single error can cascade and force a complete restart. Confirming every step avoids such failures, but imposes tedious overhead. Balancing excessive interruptions against costly rollbacks remains an open challenge. We address this problem by modeling confirmation as a minimum time scheduling problem. We conducted a formative study with eight participants, which revealed a recurring Confirmation-Diagnosis-Correction-Redo (CDCR) pattern in how users monitor errors. Based on this pattern, we developed a decision-theoretic model to determine time-efficient confirmation point placement. We then evaluated our approach using a within-subjects study where 48 participants monitored AI agents and repaired their mistakes while executing tasks. Results show that 81 percent of participants preferred our intermediate confirmation approach over the confirm-at-end approach used by existing systems, and task completion time was reduced by 13.54 percent.
Authors:Mateen Ahmed Abbasi, Petri Ihantola, Tommi Mikkonen, Niko Mäkitalo
Abstract:
Requirement Engineering (RE) is the foundation of successful software development. In RE, the goal is to ensure that implemented systems satisfy stakeholder needs through rigorous requirements elicitation, validation, and evaluation processes. Despite its critical role, RE continues to face persistent challenges, such as ambiguity, conflicting stakeholder needs, and the complexity of managing evolving requirements. A common view is that Artificial Intelligence (AI) has the potential to streamline the RE process, resulting in improved efficiency, accuracy, and management actions. However, using AI also introduces new concerns, such as ethical issues, biases, and lack of transparency. This paper explores how AI can enhance traditional RE practices by automating labor-intensive tasks, supporting requirement prioritization, and facilitating collaboration between stakeholders and AI systems. The paper also describes the opportunities and challenges that AI brings to RE. In particular, the vision calls for ethical practices in AI, along with a much-enhanced collaboration between academia and industry professionals. The focus should be on creating not only powerful but also trustworthy and practical AI solutions ready to adapt to the fast-paced world of software development.
Authors:Jessica Y. Bo, Majeed Kazemitabaar, Mengqing Deng, Michael Inzlicht, Ashton Anderson
Abstract:
Sycophancy, the tendency of LLM-based chatbots to express excessive enthusiasm, agreement, flattery, and a lack of disagreement, is emerging as a significant risk in human-AI interactions. However, the extent to which this affects human-LLM collaboration in complex problem-solving tasks is not well quantified, especially among novices who are prone to misconceptions. We created two LLM chatbots, one with high sycophancy and one with low sycophancy, and conducted a within-subjects experiment (n=24) in the context of debugging machine learning models to isolate the effect of LLM sycophancy on users' mental models, their workflows, reliance behaviors, and their perceptions of the chatbots. Our findings show that users of the high sycophancy chatbot were less likely to correct their misconceptions and spent more time over-relying on unhelpful LLM responses. Despite these impaired outcomes, a majority of users were unable to detect the presence of excessive sycophancy.
Authors:Yoojin Hong, Yersultan Doszhan, Joseph Seering
Abstract:
Current news commenting systems are designed based on implicitly individualistic assumptions, where discussion is the result of a series of disconnected opinions. This often results in fragmented and polarized conversations that fail to represent the spectrum of public discourse. In this work, we develop a news commenting system where users take on distributed roles to collaboratively structure the comments to encourage a connected, balanced discussion space. Through a within-subject, mixed-methods evaluation (N=38), we find that the system supported three stages of participation: understanding issues, collaboratively structuring comments, and building a discussion. With our system, users' comments displayed more balanced perspectives and a more emotionally neutral argumentation. Simultaneously, we observed reduced argument strength compared to a traditional commenting system, indicating a trade-off between inclusivity and depth. We conclude with design considerations and trade-offs for introducing distributed roles in news commenting system design.
Authors:Yoojin Hong, Martina Di Paola, Braahmi Padmakumar, Hwi Joon Lee, Mahnoor Shafiq, Joseph Seering
Abstract:
Social media platforms are central to communication, yet their designs remain narrowly focused on engagement and scale. While researchers have proposed alternative visions for online spaces, these ideas are difficult to prototype within platform constraints. In this paper, we introduce a metaphor-driven system to help users imagine and explore new social media environments. The system translates users' metaphors into structured sets of platform features and generates interactive simulations populated with LLM-driven agents. To evaluate this approach, we conducted a study where participants created and interacted with simulated social media spaces. Our findings show that metaphors allow users to express distinct social expectations, and that perceived authenticity of the simulation depended on how well it captured dynamics like intimacy, participation, and temporal engagement. We conclude by discussing how metaphor-driven simulation can be a powerful design tool for prototyping alternative social architectures and expanding the design space for future social platforms.
Authors:Nicolas Fracaro, Stefano Cecconello, Mauro Conti, Niccolò Di Marco, Alessandro Galeazzi
Abstract:
The study of art evolution has provided valuable insights into societal change, often revealing long-term patterns of simplification and transformation. Album covers represent a distinctive yet understudied form of visual art that has both shaped and been shaped by cultural, technological, and commercial dynamics over the past century. As highly visible artifacts at the intersection of art and commerce, they offer a unique lens through which to study cultural evolution. In this work, we examine the visual complexity of album covers spanning 75 years and 11 popular musical genres. Using a diverse set of computational measures that capture multiple dimensions of visual complexity, our analysis reveals a broad shift toward minimalism across most genres, with notable exceptions that highlight the heterogeneity of aesthetic trends. At the same time, we observe growing variance over time, with many covers continuing to display high levels of abstraction and intricacy. Together, these findings position album covers as a rich, quantifiable archive of cultural history and underscore the value of computational approaches in the systematic study of the arts, bridging quantitative analysis with aesthetic and cultural inquiry.
Authors:Rudrajit Choudhuri, Carmen Badea, Christian Bird, Jenna Butler, Rob DeLine, Brian Houck
Abstract:
Generative AI is reshaping software work, yet we lack clear guidance on where developers most need and want support, and how to design it responsibly. We report a large-scale, mixed-methods study of N=860 developers that examines where, why, and how they seek or limit AI help, providing the first task-aware, empirically validated mapping from developers' perceptions of their tasks to AI adoption patterns and responsible AI priorities. Using cognitive appraisal theory, we show that task evaluations predict openness to and use of AI, revealing distinct patterns: strong current use and a desire for improvement in core work (e.g., coding, testing); high demand to reduce toil (e.g., documentation, operations); and clear limits for identity- and relationship-centric work (e.g., mentoring). Priorities for responsible AI support vary by context: reliability and security for systems-facing tasks; transparency, alignment, and steerability to maintain control; and fairness and inclusiveness for human-facing work. Our results offer concrete, contextual guidance for delivering AI where it matters to developers and their work.
Authors:Xinyang Shan, Yuanyuan Xu, Yuqing Wang, Tian Xia, Yinshan Lin
Abstract:
This study investigates the design of inclusive wine-tasting experiences by examining the roles of human diversity and personal food memory. Through field studies conducted in various wine regions, we explored how Chinese visitors engage with wine-tasting activities during winery tours, highlighting the cross-cultural challenges they face. Our findings underscore the importance of experiencers' abilities, necessities, and aspirations (ANAs), the authenticity of wine tasting within the context of winery tours, and the use of personal food memories as a wine-tasting tool accessible to all. These insights lay the groundwork for developing more inclusive and engaging wine-tasting services, offering new perspectives for cultural exchange and sustainable wine business practices in China.
Authors:Xinyang Shan, Yuanyuan Xu, Tian Xia, Yinshan Lin
Abstract:
Wine tasting is a multimodal and culturally embedded activity that presents unique challenges when adapted to non-Western contexts. This paper proposes a service design approach rooted in contextual co-creation to reimagine wine tasting experiences for Chinese consumers. Drawing on 26 in-situ interviews and follow-up validation sessions, we identify three distinct user archetypes: Curious Tasters, Experience Seekers, and Knowledge Builders, each exhibiting different needs in vocabulary, interaction, and emotional pacing. Our findings reveal that traditional wine descriptors lack cultural resonance and that cross-modal metaphors grounded in local gastronomy (e.g., green mango for acidity) significantly improve cognitive and emotional engagement. These insights informed a partially implemented prototype, featuring AI-driven metaphor-to-flavour mappings and real-time affective feedback visualisation. A small-scale usability evaluation confirmed improvements in engagement and comprehension. Our comparative analysis shows alignment with and differentiation from prior multimodal and affect-aware tasting systems. This research contributes to CBMI by demonstrating how culturally adaptive interaction systems can enhance embodied consumption experiences in physical tourism and beyond.
Authors:Niklas Gutheil, Valentin Mayer, Leopold Müller, Jörg Rommelt, Niklas Kühl
Abstract:
Effective prompt engineering is critical to realizing the promised productivity gains of large language models (LLMs) in knowledge-intensive tasks. Yet, many users struggle to craft prompts that yield high-quality outputs, limiting the practical benefits of LLMs. Existing approaches, such as prompt handbooks or automated optimization pipelines, either require substantial effort, expert knowledge, or lack interactive guidance. To address this gap, we design and evaluate PromptPilot, an interactive prompting assistant grounded in four empirically derived design objectives for LLM-enhanced prompt engineering. We conducted a randomized controlled experiment with 80 participants completing three realistic, work-related writing tasks. Participants supported by PromptPilot achieved significantly higher performance (median: 78.3 vs. 61.7; p = .045, d = 0.56), and reported enhanced efficiency, ease-of-use, and autonomy during interaction. These findings empirically validate the effectiveness of our proposed design objectives, establishing LLM-enhanced prompt engineering as a viable technique for improving human-AI collaboration.
Authors:Zahra Fakoor Harehdasht, Raziyeh Saki
Abstract:
Recently, the unequal presence of women compared to men in technology has attracted the attention of researchers and practitioners across multiple fields. It is time to regard this problem as a global crisis that not only limits access to talent but also reduces the diversity of perspectives that shape technological innovation. This article examines the psychological and social barriers that influence this gap, as well as the interventions designed to reduce it. Using a structured review, the findings assemble evidence on the role of early gender stereotypes in the family and school and the continuation of this crisis in educational and career choices, through to the psychological challenges women face in professional settings, such as feelings of self-undervaluation, occupational anxiety, a heightened fear of technology, and structural limitations in educational environments. Special attention is paid to Germany, where the technology gap is particularly evident and where multiple national programs have been implemented to address it. The present review shows that effective solutions require more than anti-discrimination policies: they should include educational practices, organizational reforms, mentoring, and psychological support. The article concludes by outlining practical and research implications and introduces the NEURON project as a pilot interdisciplinary initiative aimed at accelerating current empowerment efforts and developing new programs for women in technology occupations.
Authors:Chinatsu Ozawa, Tatsuya Minagawa, Yoichi Ochiai
Abstract:
This study explores "Photo Tattooing," merging photography and body ornamentation, and introduces the concept of "Photographic Conviviality." Using our instant camera that prints images onto mesh screens for immediate body art, we examine how this integration affects personal expression and challenges traditional photography. Workshops revealed that this fusion redefines photography's role, fostering intimacy and shared experiences, and opens new avenues for self-expression by transforming static images into dynamic, corporeal experiences.
Authors:Flavio Figueiredo, Giovanni Martinelli, Henrique Sousa, Pedro Rodrigues, Frederico Pedrosa, Lucas N. Ferreira
Abstract:
Recent advances in AI music (AIM) generation services are currently transforming the music industry. Given these advances, understanding how humans perceive AIM is crucial both to educate users on identifying AIM songs, and, conversely, to improve current models. We present results from a listener-focused experiment aimed at understanding how humans perceive AIM. In a blind, Turing-like test, participants were asked to distinguish, from a pair, the AIM and human-made song. We contrast with other studies by utilizing a randomized controlled crossover trial that controls for pairwise similarity and allows for a causal interpretation. We are also the first study to employ a novel, author-uncontrolled dataset of AIM songs from real-world usage of commercial models (i.e., Suno). We establish that listeners' reliability in distinguishing AIM causally increases when pairs are similar. Lastly, we conduct a mixed-methods content analysis of listeners' free-form feedback, revealing a focus on vocal and technical cues in their judgments.
Authors:Diana Mykhaylychenko, Maisha Thasin, Dunya Baradari, Charmelle Mhungu
Abstract:
Animist worldviews treat beings, plants, landscapes, and even tools as persons endowed with spirit, an orientation that has long shaped human-nonhuman relations through ritual and moral practice. While modern industrial societies have often imagined technology as mute and mechanical, recent advances in artificial intelligence (AI), especially large language models (LLMs), invite people to anthropomorphize and attribute inner life to devices. This paper introduces A(I)nimism, an interactive installation exploring how large language objects (LLOs) can mediate animistic relationships with everyday things. Housed within a physical 'portal', the system uses GPT-4 Vision, voice input, and memory-based agents to create evolving object-personas. Encounters unfold through light, sound, and touch in a ritual-like process of request, conversation, and transformation that is designed to evoke empathy, wonder, and reflection. We situate the project within anthropological perspectives, speculative design, and spiritual HCI. AI's opacity, we argue, invites animistic interpretation, allowing LLOs to re-enchant the mundane and spark new questions of agency, responsibility, and design.
Authors:Haoning Xue, Yoo Jung Oh, Xinyi Zhou, Xinyu Zhang, Berit Oxley
Abstract:
Conversational AI, such as ChatGPT, is increasingly used for information seeking. However, little is known about how ordinary users actually prompt and how ChatGPT adapts its responses in real-world conversational information seeking (CIS). In this study, a nationally representative sample of 937 U.S. adults engaged in multi-turn CIS with ChatGPT on both controversial and non-controversial topics across science, health, and policy contexts. We analyzed both user prompting strategies and the communication styles of ChatGPT responses. The findings revealed behavioral signals of digital divide: only 19.1% of users employed prompting strategies, and these users were disproportionately more educated and Democrat-leaning. Further, ChatGPT demonstrated contextual adaptation: responses to controversial topics contain more cognitive complexity and more external references than to non-controversial topics. Notably, cognitively complex responses were perceived as less favorable but produced more positive issue-relevant attitudes. This study highlights disparities in user prompting behaviors and shows how user prompts and AI responses together shape information-seeking with conversational AI.
Authors:Balthazar Bujard, Jérôme Nika, Fédéric Bevilacqua, Nicolas Obin
Abstract:
This paper presents the first step in a research project situated within the field of musical agents. The objective is to achieve, through training, the tuning of the desired musical relationship between a live musical input and a real-time generated musical output, through the curation of a database of separated tracks. We propose an architecture integrating a symbolic decision module capable of learning and exploiting musical relationships from such musical corpus. We detail an offline implementation of this architecture employing Transformers as the decision module, associated with a perception module based on Wav2Vec 2.0, and concatenative synthesis as audio renderer. We present a quantitative evaluation of the decision module's ability to reproduce learned relationships extracted during training. We demonstrate that our decision module can predict a coherent track B when conditioned by its corresponding ''guide'' track A, based on a corpus of paired tracks (A, B).
Authors:Ke Luoma, Li Zengyi, Liao Jiangqun, Tong Song, Peng Kaiping
Abstract:
This study examines whether LLMs can simulate culturally grounded psychological patterns based on demographic information. Using DeepSeek, we generated 2943 virtual participants matched to demographic distributions from the CFPS2018 and compared them with human responses on the Big Five personality traits and subjective well-being across seven Chinese regions.Personality was measured using a 15-item Chinese Big Five inventory, and happiness with a single-item rating. Results revealed broad similarity between real and simulated datasets, particularly in regional variation trends. However, systematic differences emerged:simulated participants scored lower in extraversion and openness, higher in agreeableness and neuroticism, and consistently reported lower happiness. Predictive structures also diverged: while human data identified conscientiousness, extraversion and openness as positive predictors of happiness, the AI emphasized openness and agreeableness, with extraversion predicting negatively. These discrepancies suggest that while LLMs can approximate population-level psychological distributions, they underrepresent culturally specific and affective dimensions. The findings highlight both the potential and limitations of LLM-based virtual participants for large-scale psychological research and underscore the need for culturally enriched training data and improved affective modeling.
Authors:Florent Cabric, Margret Vilborg Bjarnadottir, Petra Isenberg
Abstract:
We investigate the impact of stereotyped gender-color associations in a visualization-driven decision-making task. In the context of gender data visualization, the well-known "pink for girls and blue for boys" color assignment is associated with stereotypes that could bias readers and decision-makers. Understanding the effects of using stereotyped colors in visualizations for decision-making can help designers better choose colors in stereotype-prone contexts. We therefore explore the potential impact of stereotyped colors on compensation decision-making through two crowdsourced experiments. In these experiments, we evaluate how the association of color with gender (stereotyped vs non-stereotyped) affects the user's allocation decisions in the context of salary adjustments. Our results indicate that explicit expression of the color-gender associations, in the form of a legend on the data visualization, leads to in-group favoritism. However, in the absence of a legend, this in-group favoritism disappears, and a small effect of non-stereotyped colors is observed. A free copy of this paper with all supplemental materials is available at https://osf.io/d4q3v/?view_only=22b636d6f7bb4a7991d9576933b3aaad
Authors:Ethan Chen, Sushant Kondguli, Carl Marshall, Yuhao Zhu
Abstract:
We introduce a gaze-tracking--free method to reduce OLED display power consumption in VR with minimal perceptual impact. This technique exploits the time course of chromatic adaptation, the human visual system's ability to maintain stable color perception under changing illumination. To that end, we propose a novel psychophysical paradigm that models how human adaptation state changes with the scene illuminant. We exploit this model to compute an optimal illuminant shift trajectory, controlling the rate and extent of illumination change, to reduce display power under a given perceptual loss budget. Our technique significantly improves the perceptual quality over prior work that applies illumination shifts instantaneously. Our technique can also be combined with prior work on luminance dimming to reduce display power by 31% with no statistical loss of perceptual quality.
Authors:Nana Wang, Gen Li, Suli Wang, Pengfei Ren, Hao Su
Abstract:
Electromyography (EMG)-based gesture recognition has emerged as a promising approach for human-computer interaction. However, its performance is often limited by the scarcity of labeled EMG data, significant cross-user variability, and poor generalization to unseen gestures. To address these challenges, we propose SeqEMG-GAN, a conditional, sequence-driven generative framework that synthesizes high-fidelity EMG signals from hand joint angle sequences. Our method introduces a context-aware architecture composed of an angle encoder, a dual-layer context encoder featuring the novel Ang2Gist unit, a deep convolutional EMG generator, and a discriminator, all jointly optimized via adversarial learning. By conditioning on joint kinematic trajectories, SeqEMG-GAN is capable of generating semantically consistent EMG sequences, even for previously unseen gestures, thereby enhancing data diversity and physiological plausibility. Experimental results show that classifiers trained solely on synthetic data experience only a slight accuracy drop (from 57.77% to 55.71%). In contrast, training with a combination of real and synthetic data significantly improves accuracy to 60.53%, outperforming real-only training by 2.76%. These findings demonstrate the effectiveness of our framework,also achieves the state-of-art performance in augmenting EMG datasets and enhancing gesture recognition performance for applications such as neural robotic hand control, AI/AR glasses, and gesture-based virtual gaming systems.
Authors:Jiayu Huang, Ruoxin Ritter Wang, Jen-Hao Liu, Boming Xia, Yue Huang, Ruoxi Sun, Jason Minhui Xue, Jinan Zou
Abstract:
Large language models (LLMs) are increasingly positioned as solutions for education, yet evaluations often reduce their impact to narrow performance metrics. This paper reframes the question by asking "what kind of impact should LLMs have in education?" Drawing on Biesta's tripartite account of good education: qualification, socialisation, and subjectification, we present a meta-analysis of 133 experimental and quasi-experimental studies (k = 188). Overall, the impact of LLMs on student learning is positive but uneven. Strong effects emerge in qualification, particularly when LLMs function as tutors in sustained interventions. Socialisation outcomes appear more variable, concentrated in sustained, reflective interventions. Subjectification, linked to autonomy and learner development, remains fragile, with improvements confined to small-scale, long-term studies. This purpose-level view highlights design as the decisive factor: without scaffolds for participation and agency, LLMs privilege what is easiest to measure while neglecting broader aims of education. For HCI and education, the issue is not just whether LLMs work, but what futures they enable or foreclose.
Authors:Yunhao Yuan, Jiaxun Zhang, Talayeh Aledavood, Renwen Zhang, Koustuv Saha
Abstract:
AI-powered companion chatbots (AICCs) such as Replika are increasingly popular, offering empathetic interactions, yet their psychosocial impacts remain unclear. We examined how engaging with AICCs shaped wellbeing and how users perceived these experiences. First, we conducted a large-scale quasi-experimental study of longitudinal Reddit data, applying stratified propensity score matching and Difference-in-Differences regression. Findings revealed mixed effects -- greater affective and grief expression, readability, and interpersonal focus, alongside increases in language about loneliness and suicidal ideation. Second, we complemented these results with 15 semi-structured interviews, which we thematically analyzed and contextualized using Knapp's relationship development model. We identified trajectories of initiation, escalation, and bonding, wherein AICCs provided emotional validation and social rehearsal but also carried risks of over-reliance and withdrawal. Triangulating across methods, we offer design implications for AI companions that scaffold healthy boundaries, support mindful engagement, support disclosure without dependency, and surface relationship stages -- maximizing psychosocial benefits while mitigating risks.
Authors:Neal Reeves, Elena Simperl
Abstract:
Wikipedia is among the largest examples of collective intelligence on the Web with over 61 million articles covering over 320 languages. Although edited and maintained by an active workforce of human volunteers, Wikipedia is highly reliant on automated bots to fill gaps in its human workforce. As well as administrative and governance tasks, these bots also play a role in generating content, although to date such agents represent the smallest proportion of bots. While there has been considerable analysis of bots and their activity in Wikipedia, such work captures only automated agents that have been actively deployed to Wikipedia and fails to capture the methods that have been proposed to generate Wikipedia content in the wider literature. In this paper, we conduct a systematic literature review to explore how researchers have operationalised and evaluated automated content-generation agents for Wikipedia. We identify the scope of these generation methods, the techniques and models used, the source content used for generation and the evaluation methodologies which support generation processes. We also explore implications of our findings to CSCW, User Generated Content and Wikipedia, as well as research directions for future development. To the best of our knowledge, we are among the first to review the potential contributions of this understudied form of AI support for the Wikipedia community beyond the implementation of bots.
Authors:Stina Sundstedt, Mattias Wingren, Susanne Hägglund, Daniel Ventus
Abstract:
Preschool children with language vulnerabilities -- such as developmental language disorders or immigration related language challenges -- often require support to strengthen their expressive language skills. Based on the principle of implicit learning, speech-language therapists (SLTs) typically embed target morphological structures (e.g., third person -s) into everyday interactions or game-based learning activities. Educators are recommended by SLTs to do the same. This approach demands precise linguistic knowledge and real-time production of various morphological forms (e.g., "Daddy wears these when he drives to work"). The task becomes even more demanding when educators or parent also must keep children engaged and manage turn-taking in a game-based activity. In the TalBot project our multiprofessional team have developed an application in which the Furhat conversational robot plays the word retrieval game "Alias" with children to improve language skills. Our application currently employs a large language model (LLM) to manage gameplay, dialogue, affective responses, and turn-taking. Our next step is to further leverage the capacity of LLMs so the robot can generate and deliver specific morphological targets during the game. We hypothesize that a robot could outperform humans at this task. Novel aspects of this approach are that the robot could ultimately serve as a model and tutor for both children and professionals and that using LLM capabilities in this context would support basic communication needs for children with language vulnerabilities. Our long-term goal is to create a robust LLM-based Robot-Assisted Language Learning intervention capable of teaching a variety of morphological structures across different languages.
Authors:Nana Wang, Gen Li, Zhaoxin Fan, Suli Wang
Abstract:
Cross-user electromyography (EMG)-based gesture recognition represents a fundamental challenge in achieving scalable and personalized human-machine interaction within real-world applications. Despite extensive efforts, existing methodologies struggle to generalize effectively across users due to the intrinsic biological variability of EMG signals, resulting from anatomical heterogeneity and diverse task execution styles. To address this limitation, we introduce EMG-UP, a novel and effective framework for Unsupervised Personalization in cross-user gesture recognition. The proposed framework leverages a two-stage adaptation strategy: (1) Sequence-Cross Perspective Contrastive Learning, designed to disentangle robust and user-specific feature representations by capturing intrinsic signal patterns invariant to inter-user variability, and (2) Pseudo-Label-Guided Fine-Tuning, which enables model refinement for individual users without necessitating access to source domain data. Extensive evaluations show that EMG-UP achieves state-of-the-art performance, outperforming prior methods by at least 2.0% in accuracy.
Authors:Guandong Pan, Yaqian Yang, Shi Chen, Xin Wang, Longzhao Liu, Hongwei Zheng, Shaoting Tang
Abstract:
In affective neuroscience and emotion-aware AI, understanding how complex auditory stimuli drive emotion arousal dynamics remains unresolved. This study introduces a computational framework to model the brain's encoding of naturalistic auditory inputs into dynamic behavioral/neural responses across three datasets (SEED, LIRIS, self-collected BAVE). Guided by neurobiological principles of parallel auditory hierarchy, we decompose audio into multilevel auditory features (through classical algorithms and wav2vec 2.0/Hubert) from the original and isolated human voice/background soundtrack elements, mapping them to emotion-related responses via cross-dataset analyses. Our analysis reveals that high-level semantic representations (derived from the final layer of wav2vec 2.0/Hubert) exert a dominant role in emotion encoding, outperforming low-level acoustic features with significantly stronger mappings to behavioral annotations and dynamic neural synchrony across most brain regions ($p < 0.05$). Notably, middle layers of wav2vec 2.0/hubert (balancing acoustic-semantic information) surpass the final layers in emotion induction across datasets. Moreover, human voices and soundtracks show dataset-dependent emotion-evoking biases aligned with stimulus energy distribution (e.g., LIRIS favors soundtracks due to higher background energy), with neural analyses indicating voices dominate prefrontal/temporal activity while soundtracks excel in limbic regions. By integrating affective computing and neuroscience, this work uncovers hierarchical mechanisms of auditory-emotion encoding, providing a foundation for adaptive emotion-aware systems and cross-disciplinary explorations of audio-affective interactions.
Authors:Juliana Jansen Ferreira, VinÃcius Segura, Joana Gabriela Souza, Joao Henrique Gallas Brasil
Abstract:
People face overwhelming information during work activities, necessitating effective organization and management strategies. Even in personal lives, individuals must keep, annotate, organize, and retrieve knowledge from daily routines. The collection of records for future reference is known as a personal knowledge base. Note-taking applications are valuable tools for building and maintaining these bases, often called a ''second brain''. This paper presents a case study on how people build and explore personal knowledge bases for various purposes. We selected the note-taking tool Obsidian and researchers from a Brazilian lab for an in-depth investigation. Our investigation reveals interesting findings about how researchers build and explore their personal knowledge bases. A key finding is that participants' knowledge retrieval strategy influences how they build and maintain their content. We suggest potential features for an AI system to support this process.
Authors:Zejun Liu, Yunshan Chen, Chengxi Xie, Huan Liu
Abstract:
EEG-based multimodal emotion recognition(EMER) has gained significant attention and witnessed notable advancements, the inherent complexity of human neural systems has motivated substantial efforts toward multimodal approaches. However, this field currently suffers from three critical limitations: (i) the absence of open-source implementations. (ii) the lack of standardized and transparent benchmarks for fair performance analysis. (iii) in-depth discussion regarding main challenges and promising research directions is a notable scarcity. To address these challenges, we introduce LibEMER, a unified evaluation framework that provides fully reproducible PyTorch implementations of curated deep learning methods alongside standardized protocols for data preprocessing, model realization, and experimental setups. This framework enables unbiased performance assessment on three widely-used public datasets across two learning tasks. The open-source library is publicly accessible at: https://anonymous.4open.science/r/2025ULUIUBUEUMUEUR485384
Authors:Tiany Peng, George Gui, Daniel J. Merlau, Grace Jiarui Fan, Malek Ben Sliman, Melanie Brucks, Eric J. Johnson, Vicki Morwitz, Abdullah Althenayyan, Silvia Bellezza, Dante Donati, Hortense Fong, Elizabeth Friedman, Ariana Guevara, Mohamed Hussein, Kinshuk Jerath, Bruce Kogut, Kristen Lane, Hannah Li, Patryk Perkowski, Oded Netzer, Olivier Toubia
Abstract:
Do "digital twins" capture individual responses in surveys and experiments? We run 19 pre-registered studies on a national U.S. panel and their LLM-powered digital twins (constructed based on previously-collected extensive individual-level data) and compare twin and human answers across 164 outcomes. The correlation between twin and human answers is modest (approximately 0.2 on average) and twin responses are less variable than human responses. While constructing digital twins based on rich individual-level data improves our ability to capture heterogeneity across participants and predict relative differences between them, it does not substantially improve our ability to predict the exact answers given by specific participants or enhance predictions of population means. Twin performance varies by domain and is higher among more educated, higher-income, and ideologically moderate participants. These results suggest current digital twins can capture some degree of relative differences but are unreliable for individual-level predictions and sample mean and variance estimation, underscoring the need for careful validation before use. Our data and code are publicly available for researchers and practitioners interested in optimizing digital twin pipelines.
Authors:Panayu Keelawat, Darshan Nere, Jyotshna Bali, Rezky Dwisantika, Yogesh Phalak, Ardalan Kahak, Anekan Naicker, Liang He, Suyi Li, Yan Chen
Abstract:
Strength training carries risk of injury when exercises are performed without supervision. While haptics research has advanced, there remains a gap in how to integrate on-body feedback into intelligent wearables. Developing such a design space requires experiencing feedback in context, yet obtaining functional systems is costly. By addressing these challenges, we introduce FlexGuard, a design space for on-body feedback to support injury prevention in strength training. The design space was derived from nine co-design workshops, where novice trainees and expert trainers DIY'd low-fidelity on-body feedback systems, tried them immediately, and surfaced needs and challenges encountered in real exercising contexts. We then evaluated the space through speed dating, using storyboards to cover the design dimensions. We followed up with workshops to further validate selected dimensions in practice through a proof-of-concept wearable system prototype. Our findings extend the design space for sports and fitness wearables in the context of strength training.
Authors:Yoonseo Choi, Eunhye Kim, Hyunwoo Kim, Donghyun Park, Honggu Lee, Jinyoung Kim, Juho Kim
Abstract:
If 100 people issue the same search query, they may have 100 different goals. While existing work on user-centric AI evaluation highlights the importance of aligning systems with fine-grained user intents, current search evaluation methods struggle to represent and assess this diversity. We introduce BloomIntent, a user-centric search evaluation method that uses user intents as the evaluation unit. BloomIntent first generates a set of plausible, fine-grained search intents grounded on taxonomies of user attributes and information-seeking intent types. Then, BloomIntent provides an automated evaluation of search results against each intent powered by large language models. To support practical analysis, BloomIntent clusters semantically similar intents and summarizes evaluation outcomes in a structured interface. With three technical evaluations, we showed that BloomIntent generated fine-grained, evaluable, and realistic intents and produced scalable assessments of intent-level satisfaction that achieved 72% agreement with expert evaluators. In a case study (N=4), we showed that BloomIntent supported search specialists in identifying intents for ambiguous queries, uncovering underserved user needs, and discovering actionable insights for improving search experiences. By shifting from query-level to intent-level evaluation, BloomIntent reimagines how search systems can be assessed -- not only for performance but for their ability to serve a multitude of user goals.
Authors:Haohan Shi, Yulin Yu, Daniel M. Romero, EmÅke-Ãgnes Horvát
Abstract:
Wikipedia serves as a key infrastructure for public access to scientific knowledge, but it faces challenges in maintaining the credibility of cited sources, especially when scientific papers are retracted. This paper investigates how citations to retracted research are handled on English Wikipedia. We construct a novel dataset that integrates Wikipedia revision histories with metadata from Retraction Watch, Crossref, Altmetric, and OpenAlex, identifying 1,181 citations of retracted papers. We find that 71.6% of all citations analyzed are problematic. These are citations added before a paper's retraction, as well as the citations introduced after retraction without any in-text mention of the paper's retracted status. Our analysis reveals that these citations persist for a median of over 3.68 years (1,344 days). Through survival analysis, we find that signals of human attention are associated with a faster correction process. Unfortunately, a paper's established scholarly authority, a higher academic citation count, is associated with a slower time to correction. Our findings highlight how the Wikipedia community supports collaborative maintenance but leaves gaps in citation-level repair. We contribute to CSCW research by advancing our understanding of this sociotechnical vulnerability, which takes the form of a community coordination challenge, and by offering design directions to support citation credibility at scale.
Authors:Daye Nam, Malgorzata Salawa, Satish Chandra
Abstract:
Evaluating developer satisfaction with conversational AI assistants at scale is critical but challenging. User studies provide rich insights, but are unscalable, while large-scale quantitative signals from logs or in-product ratings are often too shallow or sparse to be reliable. To address this gap, we propose and evaluate a new approach: using sentiment analysis of developer prompts to identify implicit signals of user satisfaction. With an analysis of industrial usage logs of 372 professional developers, we show that this approach can identify a signal in ~8% of all interactions, a rate more than 13 times higher than explicit user feedback, with reasonable accuracy even with an off-the-shelf sentiment analysis approach. This new practical approach to complement existing feedback channels would open up new directions for building a more comprehensive understanding of the developer experience at scale.
Authors:Lin Luo, Yuri Nakao, Mathieu Chollet, Hiroya Inakoshi, Simone Stumpf
Abstract:
Assessing fairness in artificial intelligence (AI) typically involves AI experts who select protected features, fairness metrics, and set fairness thresholds. However, little is known about how stakeholders, particularly those affected by AI outcomes but lacking AI expertise, assess fairness. To address this gap, we conducted a qualitative study with 30 stakeholders without AI expertise, representing potential decision subjects in a credit rating scenario, to examine how they assess fairness when placed in the role of deciding on features with priority, metrics, and thresholds. We reveal that stakeholders' fairness decisions are more complex than typical AI expert practices: they considered features far beyond legally protected features, tailored metrics for specific contexts, set diverse yet stricter fairness thresholds, and even preferred designing customized fairness. Our results extend the understanding of how stakeholders can meaningfully contribute to AI fairness governance and mitigation, underscoring the importance of incorporating stakeholders' nuanced fairness judgments.
Authors:Scott E. Allen, A. David Redish, René F. Kizilcec
Abstract:
Learning underlies nearly all human behavior and is central to education and education reform. Although recent advances in neuroscience have revealed the fundamental structure of learning processes, these insights have yet to be integrated into research and practice. Specifically, neuroscience has found that decision-making is governed by a structured process of perception, action-selection, and execution, supported by multiple neural systems with distinct memory stores and learning mechanisms. These systems extract different types of information (categorical, predictive, structural, and sequential) challenging canonical models of memory used in learning and behavioral science research by providing a mechanistic account of how humans acquire and use knowledge. Because each system learns differently, effective teaching requires alignment with system-specific processes. We propose a unified model that integrates these neuroscientific insights, bridging basic mechanisms with outcomes in education, identity, belonging, and wellbeing. By translating first principles of neural information processing into a generalizable framework, this work advances theories of skill acquisition and transfer while establishing a foundation for interdisciplinary research to refine how learning is understood and supported across domains of human behavior.
Authors:YeEun Lee, Dakyeom Ahn, JungYu Kwon, SeungJi Lee, Hajin Lim
Abstract:
Digital gift-giving has become a key means of maintaining social relationships, but most existing research has focused on gifting within global e-commerce or social media platforms. The emergence of messenger-based gifting in East Asia, where Korea, Japan, and China each have distinct and deeply rooted gifting traditions, remains underexplored. This study examines how in-app gifting services on the most widely used messaging platforms in South Korea (KakaoTalk), Japan (LINE), and China (WeChat) reflect and reshape culturally embedded gifting practices. Through semi-structured interviews with 26 university students, we found that KakaoTalk facilitates frequent, informal exchanges aligned with Korea's emphasis on broad social ties; LINE supports selective and carefully presented gifts, reflecting Japanese norms of formality and sincerity; and WeChat's Hongbao feature enables playful, communal monetary exchanges largely detached from traditional, obligation-driven gifting. Drawing on these findings, we propose the Channel-Oriented Gifting Cycle model, which extends classical gift-exchange theory by showing that the choice of gifting platform is not merely logistical but a culturally meaningful part of the gifting process. We conclude with design implications for culturally sensitive digital gifting services.
Authors:Wenjie Lin, Hange Liu, Xutao Mao, Yingying Zhuang, Jingwei Shi, Xudong Han, Tianyu Shi, Jinrui Yang
Abstract:
We present ParlAI Vote, an interactive system for exploring European Parliament debates and votes, and for testing LLMs on vote prediction and bias analysis. This platform connects debate topics, speeches, and roll-call outcomes, and includes rich demographic data such as gender, age, country, and political group. Users can browse debates, inspect linked speeches, compare real voting outcomes with predictions from frontier LLMs, and view error breakdowns by demographic group. Visualizing the EuroParlVote benchmark and its core tasks of gender classification and vote prediction, ParlAI Vote highlights systematic performance bias in state-of-the-art LLMs. The system unifies data, models, and visual analytics in a single interface, lowering the barrier for reproducing findings, auditing behavior, and running counterfactual scenarios. It supports research, education, and public engagement with legislative decision-making, while making clear both the strengths and the limitations of current LLMs in political analysis.
Authors:Dongyijie Primo Pan, Ji Zhu, Lan Luo, Zhiqi Gao, Xin Tong, Pan Hui
Abstract:
This paper investigates how large language models (LLMs) are reshaping competitive programming. The field functions as an intellectual contest within computer science education and is marked by rapid iteration, real-time feedback, transparent solutions, and strict integrity norms. Prior work has evaluated LLMs performance on contest problems, but little is known about how human stakeholders -- contestants, problem setters, coaches, and platform stewards -- are adapting their workflows and contest norms under LLMs-induced shifts. At the same time, rising AI-assisted misuse and inconsistent governance expose urgent gaps in sustaining fairness and credibility. Drawing on 37 interviews spanning all four roles and a global survey of 207 contestants, we contribute: (i) an empirical account of evolving workflows, (ii) an analysis of contested fairness norms, and (iii) a chess-inspired governance approach with actionable measures -- real-time LLMs checks in online contests, peer co-monitoring and reporting, and cross-validation against offline performance -- to curb LLMs-assisted misuse while preserving fairness, transparency, and credibility.
Authors:Mahesh Shakya, Bijay Adhikari, Nirsara Shrestha, Bipin Koirala, Arun Adhikari, Prasanta Poudyal, Luna Mathema, Sarbagya Buddhacharya, Bijay Khatri, Bishesh Khanal
Abstract:
Vision- and hearing-threatening diseases cause preventable disability, especially in resource-constrained settings(RCS) with few specialists and limited screening setup. Large scale AI-assisted screening and telehealth has potential to expand early detection, but practical deployment is challenging in paper-based workflows and limited documented field experience exist to build upon. We provide insights on challenges and ways forward in development to adoption of scalable AI-assisted Telehealth and screening in such settings. Specifically, we find that iterative, interdisciplinary collaboration through early prototyping, shadow deployment and continuous feedback is important to build shared understanding as well as reduce usability hurdles when transitioning from paper-based to AI-ready workflows. We find public datasets and AI models highly useful despite poor performance due to domain shift. In addition, we find the need for automated AI-based image quality check to capture gradable images for robust screening in high-volume camps. Our field learning stress the importance of treating AI development and workflow digitization as an end-to-end, iterative co-design process. By documenting these practical challenges and lessons learned, we aim to address the gap in contextual, actionable field knowledge for building real-world AI-assisted telehealth and mass-screening programs in RCS.
Authors:Travis Lloyd, Tung Nguyen, Karen Levy, Mor Naaman
Abstract:
Social media platforms are increasingly developing features that display crowdsourced context alongside posts, modeled after X's Community Notes. These systems, which we term Crowdsourced Context Systems (CCS), have the potential to reshape our information ecosystem as major platforms embrace them as alternatives to top-down fact-checking. To deeply understand the features and implications of such systems, we perform a systematic literature review of existing CCS research and analyze several real-world CSS implementations. Based on our analysis, we develop a framework with three distinct components. First, we present a theoretical model to help conceptualize and define CCS. Second, we identify a design space encompassing six key aspects of CCS: participation, inputs, curation, presentation, platform treatment, and transparency. Third, we identify key normative implications of different CCS design and implementation choices. Our framework integrates these theoretical, design, and ethical perspectives to establish a foundation for future human-centered research on Crowdsourced Context Systems.
Authors:Ramazan Yener, Guan-Hung Chen, Ece Gumusel, Masooda Bashir
Abstract:
As Conversational Artificial Intelligence (AI) becomes more integrated into everyday life, AI-powered chatbot mobile applications are increasingly adopted across industries, particularly in the healthcare domain. These chatbots offer accessible and 24/7 support, yet their collection and processing of sensitive health data present critical privacy concerns. While prior research has examined chatbot security, privacy issues specific to AI healthcare chatbots have received limited attention. Our study evaluates the privacy practices of 12 widely downloaded AI healthcare chatbot apps available on the App Store and Google Play in the United States. We conducted a three-step assessment analyzing: (1) privacy settings during sign-up, (2) in-app privacy controls, and (3) the content of privacy policies. The analysis identified significant gaps in user data protection. Our findings reveal that half of the examined apps did not present a privacy policy during sign up, and only two provided an option to disable data sharing at that stage. The majority of apps' privacy policies failed to address data protection measures. Moreover, users had minimal control over their personal data. The study provides key insights for information science researchers, developers, and policymakers to improve privacy protections in AI healthcare chatbot apps.
Authors:Lisa-Yao Gan, Arunav Das, Johanna Walker, Elena Simperl
Abstract:
Open data portals are essential for providing public access to open datasets. However, their search interfaces typically rely on keyword-based mechanisms and a narrow set of metadata fields. This design makes it difficult for users to find datasets using natural language queries. The problem is worsened by metadata that is often incomplete or inconsistent, especially when users lack familiarity with domain-specific terminology. In this paper, we examine how individual metadata fields affect the success of conversational dataset retrieval and whether LLMs can help bridge the gap between natural queries and structured metadata. We conduct a controlled ablation study using simulated natural language queries over real-world datasets to evaluate retrieval performance under various metadata configurations. We also compare existing content of the metadata field 'description' with LLM-generated content, exploring how different prompting strategies influence quality and impact on search outcomes. Our findings suggest that dataset descriptions play a central role in aligning with user intent, and that LLM-generated descriptions can support effective retrieval. These results highlight both the limitations of current metadata practices and the potential of generative models to improve dataset discoverability in open data portals.
Authors:Sami Ul Haq, Sheila Castilho, Yvette Graham
Abstract:
Machine Translation (MT) has achieved remarkable performance, with growing interest in speech translation and multimodal approaches. However, despite these advancements, MT quality assessment remains largely text centric, typically relying on human experts who read and compare texts. Since many real-world MT applications (e.g Google Translate Voice Mode, iFLYTEK Translator) involve translation being spoken rather printed or read, a more natural way to assess translation quality would be through speech as opposed text-only evaluations. This study compares text-only and audio-based evaluations of 10 MT systems from the WMT General MT Shared Task, using crowd-sourced judgments collected via Amazon Mechanical Turk. We additionally, performed statistical significance testing and self-replication experiments to test reliability and consistency of audio-based approach. Crowd-sourced assessments based on audio yield rankings largely consistent with text only evaluations but, in some cases, identify significant differences between translation systems. We attribute this to speech richer, more natural modality and propose incorporating speech-based assessments into future MT evaluation frameworks.
Authors:HwiJoon Lee, Martina Di Paola, Yoo Jin Hong, Quang-Huy Nguyen, Joseph Seering
Abstract:
LLM-based multi-agent simulations are a rapidly growing field of research, but current simulations often lack clear modes for interaction and analysis, limiting the "what if" scenarios researchers are able to investigate. In this demo, we define three core operations for interacting with multi-agent simulations: inject, fork, and compare. Inject allows researchers to introduce external events at any point during simulation execution. Fork creates independent timeline branches from any timestamp, preserving complete state while allowing divergent exploration. Compare facilitates parallel observation of multiple branches, revealing how different interventions lead to distinct emergent behaviors. Together, these operations establish a vocabulary that transforms linear simulation workflows into interactive, explorable spaces. We demonstrate this vocabulary through a commodity market simulation with fourteen AI agents, where researchers can inject contrasting events and observe divergent outcomes across parallel timelines. By defining these fundamental operations, we provide a starting point for systematic causal investigation in LLM-based agent simulations, moving beyond passive observation toward active experimentation.
Authors:Zihao Li, Weiwei Yi, Jiahong Chen
Abstract:
As Large Language Models (LLMs) permeate everyday decision-making, their epistemic and societal risks demand urgent scrutiny. Hallucinations, the generation of fabricated, misleading, oversimplified or untrustworthy outputs, has emerged as imperative challenges. While regulatory, academic, and technical discourse position accuracy as the principal benchmark for mitigating such harms, this article contends that overreliance on accuracy misdiagnoses the problem and has counterproductive effect: the accuracy paradox. Drawing on interdisciplinary literatures, this article develops a taxonomy of hallucination types and shows the paradox along three intertwining dimensions: outputs, individuals and society. First, accuracy functions as a superficial proxy for reliability, incentivising the optimisation of rhetorical fluency and surface-level correctness over epistemic trustworthiness. This encourages passive user trust in outputs that appear accurate but epistemically untenable. Second, accuracy as a singular metric fails to detect harms that are not factually false but are nonetheless misleading, value-laden, or socially distorting, including consensus illusions, sycophantic alignment, and subtle manipulation. Third, regulatory overemphasis on accuracy obscures the wider societal consequences of hallucination, including social sorting, privacy violations, equity harms, epistemic convergence that marginalises dissent, reduces pluralism, and causes social deskilling. By examining the EU AI Act, GDPR, and DSA, the article argues that current regulations are not yet structurally equipped to address these epistemic, relational, and systemic harms and exacerbated by the overreliance on accuracy. By exposing such conceptual and practical challenges, this article calls for a fundamental shift towards pluralistic, context-aware, and manipulation-resilient approaches to AI trustworthy governance.
Authors:Henrik Axelsen, Valdemar Licht, Jan Damsgaard
Abstract:
The cost and complexity of financial crime compliance (FCC) continue to rise, often without measurable improvements in effectiveness. While AI offers potential, most solutions remain opaque and poorly aligned with regulatory expectations. This paper presents the design and deployment of an agentic AI system for FCC in digitally native financial platforms. Developed through an Action Design Research (ADR) process with a fintech firm and regulatory stakeholders, the system automates onboarding, monitoring, investigation, and reporting, emphasizing explainability, traceability, and compliance-by-design. Using artifact-centric modeling, it assigns clearly bounded roles to autonomous agents and enables task-specific model routing and audit logging. The contribution includes a reference architecture, a real-world prototype, and insights into how Agentic AI can reconfigure FCC workflows under regulatory constraints. Our findings extend IS literature on AI-enabled compliance by demonstrating how automation, when embedded within accountable governance structures, can support transparency and institutional trust in high-stakes, regulated environments.
Authors:Tim Bary, Benoît Macq, Louis Petit
Abstract:
AI systems often fail to deliver reliable predictions across all inputs, prompting the need for hybrid human-AI decision-making. Existing Learning to Defer (L2D) approaches address this by training deferral models, but these are sensitive to changes in expert composition and require significant retraining if experts change. We propose a training-free, model- and expert-agnostic framework for expert deferral based on conformal prediction. Our method uses the prediction set generated by a conformal predictor to identify label-specific uncertainty and selects the most discriminative expert using a segregativity criterion, measuring how well an expert distinguishes between the remaining plausible labels. Experiments on CIFAR10-H and ImageNet16-H show that our method consistently outperforms both the standalone model and the strongest expert, with accuracies attaining $99.57\pm0.10\%$ and $99.40\pm0.52\%$, while reducing expert workload by up to a factor of $11$. The method remains robust under degraded expert performance and shows a gradual performance drop in low-information settings. These results suggest a scalable, retraining-free alternative to L2D for real-world human-AI collaboration.
Authors:Zinat Ara, Imtiaz Bin Rahim, Puqi Zhou, Liuchuan Yu, Behzad Esmaeili, Lap-Fai Yu, Sungsoo Ray Hong
Abstract:
Adults with Attention Deficit Hyperactivity Disorder (ADHD) experience challenges sustaining attention in the workplace. Body doubling, the concept of working alongside another person, has been proposed as a productivity aid for ADHD and other neurodivergent populations (NDs). However, prior work found no conclusive effectiveness and noted NDs' discomfort with social presence. This work investigates body doubling as an ADHD centered productivity strategy in construction tasks. In Study 1, we explored challenges ADHD workers face in construction and identified design insights. In Study 2, we implemented a virtual reality bricklaying task under three conditions: (C1) alone, (C2) with a human body double, and (C3) with an AI body double. Results from 12 participants show they finished tasks faster and perceived greater accuracy and sustained attention in C2 and C3 compared to C1. While body doubling was clearly preferred, opinions diverged between conditions. Our findings verify its effect and offer design implications for future interventions.
Authors:Liangxuan Guo, Bin Zhu, Qingqian Tao, Kangning Liu, Xun Zhao, Xianzhe Qin, Jin Gao, Guangfu Hao
Abstract:
Autonomous agents for desktop automation struggle with complex multi-step tasks due to poor coordination and inadequate quality control. We introduce Agentic Lybic, a novel multi-agent system where the entire architecture operates as a finite-state machine (FSM). This core innovation enables dynamic orchestration. Our system comprises four components: a Controller, a Manager, three Workers (Technician for code-based operations, Operator for GUI interactions, and Analyst for decision support), and an Evaluator. The critical mechanism is the FSM-based routing between these components, which provides flexibility and generalization by dynamically selecting the optimal execution strategy for each subtask. This principled orchestration, combined with robust quality gating, enables adaptive replanning and error recovery. Evaluated officially on the OSWorld benchmark, Agentic Lybic achieves a state-of-the-art 57.07% success rate in 50 steps, substantially outperforming existing methods. Results demonstrate that principled multi-agent orchestration with continuous quality control provides superior reliability for generalized desktop automation in complex computing environments.
Authors:Siying Hu, Zhenhao Zhang
Abstract:
Sedentary behavior is a critical health risk for older adults. While digital interventions exist, they often rely on screen-based notifications that feel clinical and are easily ignored. This paper presents a Research through Design inquiry into data physicalization as a humane alternative. We designed and deployed tangible artifacts that ambiently represent sedentary patterns in older adults' homes. These artifacts transform abstract data into aesthetic, evolving forms, becoming part of the domestic landscape. Through a long-term in-situ study, our analysis reveals these physicalizations fostered self-reflection, family conversations, and prompted reflection on activity. Our work contributes empirical design principles for tangible health interventions that are both evocative and actionable. We demonstrate how qualities like aesthetic ambiguity and slow revelation can empower older adults, fostering a reflective relationship with their wellbeing. We argue this approach signals a necessary shift from merely informing users to enabling them to live with and through their data.
Authors:Siying Hu, Zhenhao Zhang
Abstract:
Vocabulary acquisition in early education often relies on rote memorization and passive screen-based tools, which can fail to engage students kinesthetically and collaboratively. This paper introduces Vocabuild, an augmented tangible interface designed to transform vocabulary learning into an active, embodied, and playful experience. The system combines physical letter blocks with a projection-augmented surface. As children physically construct words with the blocks, the system provides real-time, dynamic feedback, such as displaying corresponding images and animations, thus helping them construct semantic meaning. Deployed in a classroom context, our gamified approach fosters both individual exploration and peer collaboration. A user study conducted with elementary school children demonstrates that our tangible interface leads to higher engagement, increased collaboration, and a more positive attitude towards learning compared to traditional methods. Our contributions are twofold: (1) the design and implementation of Vocabuild, a projection-augmented tangible system that transforms vocabulary learning into an embodied and collaborative activity; and (2) empirical findings from a classroom study showing that our tangible approach significantly increases engagement, peer collaboration, and positive learning attitudes compared to traditional methods.
Authors:Omar El Malki, Marianne Aubin Le Quéré, Andrés Monroy-Hernández, Manoel Horta Ribeiro
Abstract:
Modern social media feeds use predictive models to maximize engagement, often misaligning how people consume content with how they wish to. We introduce Bonsai, a system that enables people to build personalized and intentional feeds. Bonsai implements a platform-agnostic framework comprising Planning, Sourcing, Curating, and Ranking modules. Altogether, this framework allows users to express their intent in natural language and exert fine-grained control over a procedurally transparent feed creation process. We evaluated the system with 15 Bluesky users in a two-phase, multi-week study. We find that participants successfully used our system to discover new content, filter out irrelevant or toxic posts, and disentangle engagement from intent, but curating intentional feeds required participants to exert more effort than they are used to. Simultaneously, users sought system transparency mechanisms to trust and effectively use intentional, personalized feeds. Overall, our work highlights intentional feedbuilding as a viable path beyond engagement-based optimization.
Authors:Adnan Ahmad, Philine Kowol, Stefan Hillmann, Sebastian Möller
Abstract:
In this paper, we provide an extensive analysis of multi-label intent classification using Large Language Models (LLMs) that are open-source, publicly available, and can be run in consumer hardware. We use the MultiWOZ 2.1 dataset, a benchmark in the dialogue system domain, to investigate the efficacy of three popular open-source pre-trained LLMs, namely LLama2-7B-hf, Mistral-7B-v0.1, and Yi-6B. We perform the classification task in a few-shot setup, giving 20 examples in the prompt with some instructions. Our approach focuses on the differences in performance of these models across several performance metrics by methodically assessing these models on multi-label intent classification tasks. Additionally, we compare the performance of the instruction-based fine-tuning approach with supervised learning using the smaller transformer model BertForSequenceClassification as a baseline. To evaluate the performance of the models, we use evaluation metrics like accuracy, precision, and recall as well as micro, macro, and weighted F1 score. We also report the inference time, VRAM requirements, etc. The Mistral-7B-v0.1 outperforms two other generative models on 11 intent classes out of 14 in terms of F-Score, with a weighted average of 0.50. It also has relatively lower Humming Loss and higher Jaccard Similarity, making it the winning model in the few-shot setting. We find BERT based supervised classifier having superior performance compared to the best performing few-shot generative LLM. The study provides a framework for small open-source LLMs in detecting complex multi-intent dialogues, enhancing the Natural Language Understanding aspect of task-oriented chatbots.
Authors:Poorna Talkad Sukumar, Maurizio Porfiri, Oded Nov
Abstract:
Racial homophily refers to the tendency of individuals to associate with others of the same racial or ethnic background. A recent study found no evidence of racial homophily in responses to mass shooting data visualizations. To increase the likelihood of detecting an effect, we redesigned the experiment by replacing bar charts with anthropographics and expanding the sample size. In a crowdsourced study (N=720), we showed participants a pictograph of mass shooting victims in the United States, with victims from one of three racial groups (Hispanic, Black, or White) highlighted. Each participant was assigned a visualization highlighting either their own racial group or a different racial group, allowing us to assess the influence of racial concordance on changes in affect (emotion). We found that, across all conditions, racial concordance had a modest but significant effect on changes in affect, with participants experiencing greater negative affect change when viewing visualizations highlighting their own race. This study provides initial evidence that racial homophily can emerge in responses to data visualizations, particularly when using anthropographics.
Authors:Arleth Salinas, Irtaza Sohail, Valerio Pascucci, Pantelis Stefanakis, Saud Amjad, Aashish Panta, Roland Schigas, Timothy Chun-Yiu Chui, Nicolas Duboc, Mostafa Farrokhabadi, Roland Stull
Abstract:
Data reuse is using data for a purpose distinct from its original intent. As data sharing becomes more prevalent in science, enabling effective data reuse is increasingly important. In this paper, we present a power systems case study of data repurposing for enabling data reuse. We define data repurposing as the process of transforming data to fit a new research purpose. In our case study, we repurpose a geospatial wildfire smoke forecast dataset into a historical dataset. We analyze its efficacy toward analyzing wildfire smoke impact on solar photovoltaic energy production. We also provide documentation and interactive demos for using the repurposed dataset. We identify key enablers of data reuse including metadata standardization, contextual documentation, and communication between data creators and reusers. We also identify obstacles to data reuse such as risk of misinterpretation and barriers to efficient data access. Through an iterative approach to data repurposing, we demonstrate how leveraging and expanding knowledge transfer infrastructures like online documentation, interactive visualizations, and data streaming directly address these obstacles. The findings facilitate big data use from other domains for power systems applications and grid resiliency.
Authors:Nikhita Joshi, Daniel Vogel
Abstract:
AI capabilities for document reader software are usually presented in separate chat interfaces. We explore integrating AI into document comments, a concept we formalize as AI margin notes. Three design parameters characterize this approach: margin notes are integrated with the text while chat interfaces are not; selecting text for a margin note can be automated through AI or manual; and the generation of a margin note can involve AI to various degrees. Two experiments investigate integration and selection automation, with results showing participants prefer integrated AI margin notes and manual selection. A third experiment explores human and AI involvement through six alternative techniques. Techniques with less AI involvement resulted in more psychological ownership, but faster and less effortful designs are generally preferred. Surprisingly, the degree of AI involvement had no measurable effect on reading comprehension. Our work shows that AI margin notes are desirable and contributes implications for their design.
Authors:Sandeep Banik, Naira Hovakimyan
Abstract:
Shared autonomy requires principled mechanisms for allocating and transferring control between a human and an autonomous agent. Existing approaches often rely on blending control inputs between human and autonomous agent or switching rules, which lack theoretical guarantees. This paper develops a game-theoretic framework for modeling cooperative takeover in shared autonomy. We formulate the switching interaction as a dynamic game in which authority is embedded directly into the system dynamics, resulting in Nash equilibrium(NE)-based strategies rather than ad hoc switching rules. We establish the existence and characterization of NE in the space of pure takeover strategies under stochastic human intent. For the class of linear-quadratic systems, we derive closed-form recursions for takeover strategies and saddle-point value functions, providing analytical insight and efficient computation of cooperative takeover policies. We further introduce a bimatrix potential game reformulation to address scenarios where human and autonomy utilities are not perfectly aligned, yielding a unifying potential function that preserves tractability while capturing intent deviations. The framework is applied to a vehicle trajectory tracking problem, demonstrating how equilibrium takeover strategies adapt across straight and curved path segments. The results highlight the trade-off between human adaptability and autonomous efficiency and illustrate the practical benefits of grounding shared autonomy in cooperative game theory.
Authors:Zhaoxun "Lorenz" Liu, Wagner H. Souza, Jay Han, Amin Madani
Abstract:
Mass casualty incidents (MCIs) overwhelm healthcare systems and demand rapid, accurate patient-hospital allocation decisions under extreme pressure. Here, we developed and validated a deep reinforcement learning-based decision-support AI agent to optimize patient transfer decisions during simulated MCIs by balancing patient acuity levels, specialized care requirements, hospital capacities, and transport logistics. To integrate this AI agent, we developed MasTER, a web-accessible command dashboard for MCI management simulations. Through a controlled user study with 30 participants (6 trauma experts and 24 non-experts), we evaluated three interaction approaches with the AI agent (human-only, human-AI collaboration, and AI-only) across 20- and 60-patient MCI scenarios in the Greater Toronto Area. Results demonstrate that increasing AI involvement significantly improves decision quality and consistency. The AI agent outperforms trauma surgeons (p < 0.001) and enables non-experts to achieve expert-level performance when assisted, contrasting sharply with their significantly inferior unassisted performance (p < 0.001). These findings establish the potential for our AI-driven decision support to enhance both MCI preparedness training and real-world emergency response management.
Authors:Sarigai Sarigai, Liping Yang, Katie Slack, Carolyn Fish, Michaela Buenemann, Qiusheng Wu, Yan Lin, Joseph A. Cook, David Jacobs
Abstract:
As interactive web-based geovisualization becomes increasingly vital across disciplines, there is a growing need for open-source frameworks that support dynamic, multi-attribute spatial analysis and accessible design. This paper introduces dciWebMapper2, a significant expansion of the original dciWebMapper framework, designed to enable exploratory analysis across domains such as climate justice, food access, and social vulnerability. The enhanced framework integrates multiple map types, including choropleth, proportional symbol, small multiples, and heatmaps, with linked statistical charts (e.g., scatter plots, boxplots) and time sliders, all within a coordinated-view environment. Dropdown-based controls allow flexible, high-dimensional comparisons while maintaining visual clarity. Grounded in cartographic and information visualization principles, dciWebMapper2 is fully open-source, self-contained, and server-free, supporting modularity, reproducibility, and long-term sustainability. Three applied use cases demonstrate its adaptability and potential to democratize interactive web cartography. This work offers a versatile foundation for inclusive spatial storytelling and transparent geospatial analysis in research, education, and civic engagement.
Authors:Moyan Zhou, Soobin Cho, Loren Terveen
Abstract:
Large language models (LLMs) are reshaping knowledge production as community members increasingly incorporate them into their contribution workflows. However, participating in knowledge communities involves more than just contributing content - it is also a deeply social process. While communities must carefully consider appropriate and responsible LLM integration, the absence of concrete norms has left individual editors to experiment and navigate LLM use on their own. Understanding how LLMs influence community participation is therefore critical in shaping future norms and supporting effective adoption. To address this gap, we investigated Wikipedia, one of the largest knowledge production communities, to understand 1) how LLMs influence the ways editors contribute content, 2) what strategies editors leverage to align LLM outputs with community norms, and 3) how other editors in the community respond to LLM-assisted contributions. Through interviews with 16 Wikipedia editors who had used LLMs for their edits, we found that 1) LLMs affected the content contributions for experienced and new editors differently; 2) aligning LLM outputs with community norms required tacit knowledge that often challenged newcomers; and 3) as a result, other editors responded to LLM-assisted edits differently depending on the editors' expertise level. Based on these findings, we challenge existing models of newcomer involvement and propose design implications for LLMs that support community engagement through scaffolding, teaching, and context awareness.
Authors:Jackie Liu, Mehrnoosh Sadat Shirvani, Hwajung Hong, Ig-Jae Kim, Dongwook Yoon
Abstract:
Social media clones are AI-powered social delegates of ourselves created using our personal data. As our identities and online personas intertwine, these technologies have the potential to greatly enhance our social media experience. If mismanaged, however, these clones may also pose new risks to our social reputation and online relationships. To set the foundation for a productive and responsible integration, we set out to understand how social media clones will impact our online behavior and interactions. We conducted a series of semi-structured interviews introducing eight speculative clone concepts to 32 social media users through a design workbook. Applying existing work in AI-mediated communication in the context of social media, we found that although clones can offer convenience and comfort, they can also threaten the user's authenticity and increase skepticism within the online community. As a result, users tend to behave more like their clones to mitigate discrepancies and interaction breakdowns. These findings are discussed through the lens of past literature in identity and impression management to highlight challenges in the adoption of social media clones by the general public, and propose design considerations for their successful integration into social media platforms.
Authors:Huihong Liang, Dongxuan Jia, Youquan Wang, Longtao Huang, Shida Zhong, Luping Xiang, Lei Huang, Tao Yuan
Abstract:
In this demo, we present a compact intelligent audio system-on-chip (SoC) integrated with a keyword spotting accelerator, enabling ultra-low latency, low-power, and low-cost voice interaction in Internet of Things (IoT) devices. Through algorithm-hardware co-design, the system's energy efficiency is maximized. We demonstrate the system's capabilities through a live FPGA-based prototype, showcasing stable performance and real-time voice interaction for edge intelligence applications.
Authors:Mehrnoosh Sadat Shirvani, Jackie Liu, Thomas Chao, Suky Martinez, Laura Brandt, Ig-Jae Kim, Dongwook Yoon
Abstract:
Mental health conversational agents have the potential to deliver valuable therapeutic impact, but low user engagement remains a critical barrier hindering their efficacy. Existing therapeutic approaches have leveraged clients' internal dialogues (e.g., journaling, talking to an empty chair) to enhance engagement through accountable, self-sourced support. Inspired by these, we designed novel AI-driven self-clone chatbots that replicate users' support strategies and conversational patterns to improve therapeutic engagement through externalized meaningful self-conversation. Validated through a semi-controlled experiment (N=180), significantly higher emotional and cognitive engagement was demonstrated with self-clone chatbots than a chatbot with a generic counselor persona. Our findings highlight self-clone believability as a mediator and emphasize the balance required in maintaining convincing self-representation while creating positive interactions. This study contributes to AI-based mental health interventions by introducing and evaluating self-clones as a promising approach to increasing user engagement, while exploring implications for their application in mental health care.
Authors:Jiyoon Pyo, Yuankun Jiao, Yao-Yi Chiang, Michael Corey
Abstract:
Though no longer legally enforceable, racial covenants in twentieth-century property deeds continue to shape spatial and socioeconomic inequalities. Understanding this legacy requires identifying racially restrictive language and geolocating affected properties. The Mapping Prejudice project addresses this by engaging volunteers on the Zooniverse crowdsourcing platform to transcribe covenants from scanned deeds and link them to modern parcel maps using transcribed legal descriptions. While the project has explored automation, it values crowdsourcing for its social impact and technical advantages. Historically, Mapping Prejudice relied on lexicon-based searching and, more recently, fuzzy matching to flag suspected covenants. However, fuzzy matching has increased false positives, burdening volunteers and raising scalability concerns. Additionally, while many properties can be mapped automatically, others still require time-intensive manual geolocation. We present a human-centered computing approach with two plug-and-play NLP pipelines: (1) a context-aware text labeling model that flags racially restrictive language with high precision and (2) a georeferencing module that extracts geographic descriptions from deeds and resolves them to real-world locations. Evaluated on historical deed documents from six counties in Minnesota and Wisconsin, our system reduces false positives in racial term detection by 25.96% while maintaining 91.73% recall and achieves 85.58% georeferencing accuracy within 1x1 square-mile ranges. These tools enhance document filtering and enrich spatial annotations, accelerating volunteer participation and reducing manual cleanup while strengthening public engagement.
Authors:Sebastian Hubenschmid, Marc Satkowski, Johannes Zagermann, Julián Méndez, Niklas Elmqvist, Steven Feiner, Tiara Feuchtner, Jens Emil Grønbæk, Benjamin Lee, Dieter Schmalstieg, Raimund Dachselt, Harald Reiterer
Abstract:
We investigate hybrid user interfaces (HUIs), aiming to establish a cohesive understanding and adopt consistent terminology for this nascent research area. HUIs combine heterogeneous devices in complementary roles, leveraging the distinct benefits of each. Our work focuses on cross-device interaction between 2D devices and mixed reality environments, which are particularly compelling, leveraging the familiarity of traditional 2D platforms while providing spatial awareness and immersion. Although such HUIs have been prominently explored in the context of mixed reality by prior work, we still lack a cohesive understanding of the unique design possibilities and challenges of such combinations, resulting in a fragmented research landscape. We conducted a systematic survey and present a taxonomy of HUIs that combine conventional display technology and mixed reality environments. Based on this, we discuss past and current challenges, the evolution of definitions, and prospective opportunities to tie together the past 30 years of research with our vision of future HUIs.
Authors:Christian Masuhr, Julian Koch, Thorsten Schüppstuhl
Abstract:
Rigorous evaluation of commercial Augmented Reality (AR) hardware is crucial, yet public benchmarks for tool tracking on modern Head-Mounted Displays (HMDs) are limited. This paper addresses this gap by systematically assessing the Magic Leap 2 (ML2) controllers tracking performance. Using a robotic arm for repeatable motion (EN ISO 9283) and an optical tracking system as ground truth, our protocol evaluates static and dynamic performance under various conditions, including realistic paths from a hydrogen leak inspection use case. The results provide a quantitative baseline of the ML2 controller's accuracy and repeatability and present a robust, transferable evaluation methodology. The findings provide a basis to assess the controllers suitability for the inspection use case and similar industrial sensor-based AR guidance tasks.
Authors:Eneko Atxa Landa, Elena Lazkano, Igor Rodriguez, Itsaso RodrÃguez-Moreno, Itziar Irigoien
Abstract:
Animating realistic avatars requires using high quality animations for every possible state the avatar can be in. This includes actions like walking or running, but also subtle movements that convey emotions and personality. Idle animations, such as standing, breathing or looking around, are crucial for realism and believability. In games and virtual applications, these are often handcrafted or recorded with actors, but this is costly. Furthermore, recording realistic idle animations can be very complex, because the actor must not know they are being recorded in order to make genuine movements. For this reasons idle animation datasets are not widely available. Nevertheless, this paper concludes that both acted and genuine idle animations are perceived as real, and that users are not able to distinguish between them. It also states that handmade and recorded idle animations are perceived differently. These two conclusions mean that recording idle animations should be easier than it is thought to be, meaning that actors can be specifically told to act the movements, significantly simplifying the recording process. These conclusions should help future efforts to record idle animation datasets. Finally, we also publish ReActIdle, a 3 dimensional idle animation dataset containing both real and acted idle motions.
Authors:Melik Ozolcer, Sang Won Bae
Abstract:
This paper introduces SePA (Search-enhanced Predictive AI Agent), a novel LLM health coaching system that integrates personalized machine learning and retrieval-augmented generation to deliver adaptive, evidence-based guidance. SePA combines: (1) Individualized models predicting daily stress, soreness, and injury risk from wearable sensor data (28 users, 1260 data points); and (2) A retrieval module that grounds LLM-generated feedback in expert-vetted web content to ensure contextual relevance and reliability. Our predictive models, evaluated with rolling-origin cross-validation and group k-fold cross-validation show that personalized models outperform generalized baselines. In a pilot expert study (n=4), SePA's retrieval-based advice was preferred over a non-retrieval baseline, yielding meaningful practical effect (Cliff's $δ$=0.3, p=0.05). We also quantify latency performance trade-offs between response quality and speed, offering a transparent blueprint for next-generation, trustworthy personal health informatics systems.
Authors:Meisam Jamshidi Seikavandi, Fabricio Batista Narcizo, Ted Vucurevich, Andrew Burke Dittberner, Paolo Burelli
Abstract:
We present MuMTAffect, a novel Multimodal Multitask Affective Embedding Network designed for joint emotion classification and personality prediction (re-identification) from short physiological signal segments. MuMTAffect integrates multiple physiological modalities pupil dilation, eye gaze, facial action units, and galvanic skin response using dedicated, transformer-based encoders for each modality and a fusion transformer to model cross-modal interactions. Inspired by the Theory of Constructed Emotion, the architecture explicitly separates core affect encoding (valence/arousal) from higher-level conceptualization, thereby grounding predictions in contemporary affective neuroscience. Personality trait prediction is leveraged as an auxiliary task to generate robust, user-specific affective embeddings, significantly enhancing emotion recognition performance. We evaluate MuMTAffect on the AFFEC dataset, demonstrating that stimulus-level emotional cues (Stim Emo) and galvanic skin response substantially improve arousal classification, while pupil and gaze data enhance valence discrimination. The inherent modularity of MuMTAffect allows effortless integration of additional modalities, ensuring scalability and adaptability. Extensive experiments and ablation studies underscore the efficacy of our multimodal multitask approach in creating personalized, context-aware affective computing systems, highlighting pathways for further advancements in cross-subject generalisation.
Authors:Christian Merz, Lukas Schach, Marie Luisa Fiedler, Jean-Luc Lugrin, Carolin Wienrich, Marc Erich Latoschik
Abstract:
This paper introduces an unobtrusive in-situ measurement method to detect user behavior changes during arbitrary exposures in XR systems. Here, such behavior changes are typically associated with the Proteus effect or bodily affordances elicited by different avatars that the users embody in XR. We present a biometric user model based on deep metric similarity learning, which uses high-dimensional embeddings as reference vectors to identify behavior changes of individual users. We evaluate our model against two alternative approaches: a (non-learned) motion analysis based on central tendencies of movement patterns and subjective post-exposure embodiment questionnaires frequently used in various XR exposures. In a within-subject study, participants performed a fruit collection task while embodying avatars of different body heights (short, actual-height, and tall). Subjective assessments confirmed the effective manipulation of perceived body schema, while the (non-learned) objective analyses of head and hand movements revealed significant differences across conditions. Our similarity learning model trained on the motion data successfully identified the elicited behavior change for various query and reference data pairings of the avatar conditions. The approach has several advantages in comparison to existing methods: 1) In-situ measurement without additional user input, 2) generalizable and scalable motion analysis for various use cases, 3) user-specific analysis on the individual level, and 4) with a trained model, users can be added and evaluated in real time to study how avatar changes affect behavior.
Authors:Xianghan Wang, Chingshuan Hsiao, Shimei Qiu
Abstract:
Promisedland is a mixed-reality (MR) narrative attraction that combines cultural storytelling, ecological education, and an innovative hybrid production workflow. Set in a future Earth suffering from elemental imbalance, users embark on an interactive journey guided by symbolic characters to restore harmony through the collection of five classical elements: metal, wood, water, fire, and earth. To prototype this experience, we introduce a low-cost, high-fidelity Diorama-to-Virtual pipeline - handcrafting physical scale models, 3D scanning, and integrating them into Unreal Engine. This process enables rapid spatial prototyping while preserving the material expressiveness and narrative consistency of the physical environment. To further enhance immersion, the experience incorporates a Stewart Platform to provide motion feedback synchronized with the virtual ride dynamics, reinforcing spatial presence and embodied engagement. The final prototype runs on Meta Quest, supporting dynamic interactions and real-time visual feedback. Promisedland offers a replicable design blueprint for future XR narrative installations across museums, cultural exhibitions, and themed entertainment. It proposes a new framework for XR Narrative Attractions - where physical and digital elements converge to deepen immersion, agency, and emotional engagement.
Authors:Sylvie Delacroix, Diana Robinson, Umang Bhatt, Jacopo Domenicucci, Jessica Montgomery, Gael Varoquaux, Carl Henrik Ek, Vincent Fortuin, Yulan He, Tom Diethe, Neill Campbell, Mennatallah El-Assady, Soren Hauberg, Ivana Dusparic, Neil Lawrence
Abstract:
The growing integration of large language models across professional domains transforms how experts make critical decisions in healthcare, education, and law. While significant research effort focuses on getting these systems to communicate their outputs with probabilistic measures of reliability, many consequential forms of uncertainty in professional contexts resist such quantification. A physician pondering the appropriateness of documenting possible domestic abuse, a teacher assessing cultural sensitivity, or a mathematician distinguishing procedural from conceptual understanding face forms of uncertainty that cannot be reduced to percentages. This paper argues for moving beyond simple quantification toward richer expressions of uncertainty essential for beneficial AI integration. We propose participatory refinement processes through which professional communities collectively shape how different forms of uncertainty are communicated. Our approach acknowledges that uncertainty expression is a form of professional sense-making that requires collective development rather than algorithmic optimization.
Authors:Xuetong Wang, Ching Christie Pang, Pan Hui
Abstract:
Virtual assistants (VAs) have become ubiquitous in daily life, integrated into smartphones and smart devices, sparking interest in AI companions that enhance user experiences and foster emotional connections. However, existing companions are often embedded in specific objects-such as glasses, home assistants, or dolls-requiring users to form emotional bonds with unfamiliar items, which can lead to reduced engagement and feelings of detachment. To address this, we introduce Talking Spell, a wearable system that empowers users to imbue any everyday object with speech and anthropomorphic personas through a user-centric radiative network. Leveraging advanced computer vision (e.g., YOLOv11 for object detection), large vision-language models (e.g., QWEN-VL for persona generation), speech-to-text and text-to-speech technologies, Talking Spell guides users through three stages of emotional connection: acquaintance, familiarization, and bonding. We validated our system through a user study involving 12 participants, utilizing Talking Spell to explore four interaction intentions: entertainment, companionship, utility, and creativity. The results demonstrate its effectiveness in fostering meaningful interactions and emotional significance with everyday objects. Our findings indicate that Talking Spell creates engaging and personalized experiences, as demonstrated through various devices, ranging from accessories to essential wearables.
Authors:Stefan Schiffer, Anna Milena Rothermel, Alexander Ferrein, Astrid Rosenthal-von der Pütten
Abstract:
In this paper we present an analysis of technological and psychological factors of applying artificial intelligence (AI) at the work place. We do so for a number of twelve application cases in the context of a project where AI is integrated at work places and in work systems of the future. From a technological point of view we mainly look at the areas of AI that the applications are concerned with. This allows to formulate recommendations in terms of what to look at in developing an AI application and what to pay attention to with regards to building AI literacy with different stakeholders using the system. This includes the importance of high-quality data for training learning-based systems as well as the integration of human expertise, especially with knowledge-based systems. In terms of the psychological factors we derive research questions to investigate in the development of AI supported work systems and to consider in future work, mainly concerned with topics such as acceptance, openness, and trust in an AI system.
Authors:Dragan Ahmetovic, Matteo Manzoni, Filippo Corti, Sergio Mascetti
Abstract:
Shared control is a form of video gaming accessibility support that allows players with disabilities to delegate inaccessible controls to another person. Through interviews involving 14 individuals with lived experience of accessible gaming in shared control, we explore the ways in which shared control technologies are adopted in practice, the accessibility challenges they address, and how the support currently provided in shared control can be automated to remove the need for a human assistant. Findings indicate that shared control is essential for enabling access to otherwise inaccessible games, but its reliance on human support is a key limitation. Participants welcomed the idea of automating the support with software agents, while also identifying limitations and design requirements. Accordingly, this work contributes insights into current practices and proposes guidelines for developing automated support systems.
Authors:Ane San Martin, Michael Hagenow, Julie Shah, Johan Kildal, Elena Lazkano
Abstract:
As robot technology advances, collaboration between humans and robots will become more prevalent in industrial tasks. When humans run into issues in such scenarios, a likely future involves relying on artificial agents or robots for aid. This study identifies key aspects for the design of future user-assisting agents. We analyze quantitative and qualitative data from a user study examining the impact of on-demand assistance received from a remote human in a human-robot collaboration (HRC) assembly task. We study scenarios in which users require help and we assess their experiences in requesting and receiving assistance. Additionally, we investigate participants' perceptions of future non-human assisting agents and whether assistance should be on-demand or unsolicited. Through a user study, we analyze the impact that such design decisions (human or artificial assistant, on-demand or unsolicited help) can have on elicited emotional responses, productivity, and preferences of humans engaged in HRC tasks.
Authors:Junxiang Liu, Junming Lin, Jiangtong Li, Jie Li
Abstract:
Reconstruction dynamic visual scenes from electroencephalography (EEG) signals remains a primary challenge in brain decoding, limited by the low spatial resolution of EEG, a temporal mismatch between neural recordings and video dynamics, and the insufficient use of semantic information within brain activity. Therefore, existing methods often inadequately resolve both the dynamic coherence and the complex semantic context of the perceived visual stimuli. To overcome these limitations, we introduce DynaMind, a novel framework that reconstructs video by jointly modeling neural dynamics and semantic features via three core modules: a Regional-aware Semantic Mapper (RSM), a Temporal-aware Dynamic Aligner (TDA), and a Dual-Guidance Video Reconstructor (DGVR). The RSM first utilizes a regional-aware encoder to extract multimodal semantic features from EEG signals across distinct brain regions, aggregating them into a unified diffusion prior. In the mean time, the TDA generates a dynamic latent sequence, or blueprint, to enforce temporal consistency between the feature representations and the original neural recordings. Together, guided by the semantic diffusion prior, the DGVR translates the temporal-aware blueprint into a high-fidelity video reconstruction. On the SEED-DV dataset, DynaMind sets a new state-of-the-art (SOTA), boosting reconstructed video accuracies (video- and frame-based) by 12.5 and 10.3 percentage points, respectively. It also achieves a leap in pixel-level quality, showing exceptional visual fidelity and temporal coherence with a 9.4% SSIM improvement and a 19.7% FVMD reduction. This marks a critical advancement, bridging the gap between neural dynamics and high-fidelity visual semantics.
Authors:Felipe Arias-Russi, Yuanchen Bai, Angelique Taylor
Abstract:
The human-robot interaction (HRI) field has traditionally used Wizard-of-Oz (WoZ) controlled robots to explore navigation, conversational dynamics, human-in-the-loop interactions, and more to explore appropriate robot behaviors in everyday settings. However, existing WoZ tools are often limited to one context, making them less adaptable across different settings, users, and robotic platforms. To mitigate these issues, we introduce a Context-Adaptable Robot Interface System (CARIS) that combines advanced robotic capabilities such teleoperation, human perception, human-robot dialogue, and multimodal data recording. Through pilot studies, we demonstrate the potential of CARIS to WoZ control a robot in two contexts: 1) mental health companion and as a 2) tour guide. Furthermore, we identified areas of improvement for CARIS, including smoother integration between movement and communication, clearer functionality separation, recommended prompts, and one-click communication options to enhance the usability wizard control of CARIS. This project offers a publicly available, context-adaptable tool for the HRI community, enabling researchers to streamline data-driven approaches to intelligent robot behavior.
Authors:Yuqing Wei, Yupeng Wang, Jiayi Zhao, Yanjun Liu, Huxin Gao, Jiewen Lai
Abstract:
Extended reality (XR), encompassing Virtual Reality (VR), Augmented Reality (AR), and Mixed Reality (MR), is emerging as a transformative platform for medical education. Traditional methods such as textbooks, physical models, and cadaveric dissections often lack interactivity and fail to convey complex spatial relationships effectively. The emerging MR technology addresses these limitations by providing immersive environments that blend virtual elements with real-world contexts. This study presents an MR application for head anatomy education, enabling learners to intuitively interact with see-through 3D anatomical structures via hand gestures and controllers. Our hierarchical information design supports progressive learning, guiding users from basic anatomical labels to detailed structural insights. Additionally, the system incorporates an automatic calibration module that aligns virtual anatomical models with a real human head, thereby facilitating realistic human-model interactions. Experiments show that the system can effectively match the anatomical model with real-time scenes, thus enhancing the interactivity and immersion of medical education, providing an innovative tool for teaching anatomy.
Authors:Maya Guhan, Meghan E. Hurley, Eric A. Storch, John Herrington, Casey Zampella, Julia Parish-Morris, Gabriel Lázaro-Muñoz, Kristin Kostick-Quenet
Abstract:
Artificial intelligence (AI)-based computer perception (CP) technologies use mobile sensors to collect behavioral and physiological data for clinical decision-making. These tools can reshape how clinical knowledge is generated and interpreted. However, effective integration of these tools into clinical workflows depends on how developers balance clinical utility with user acceptability and trustworthiness. Our study presents findings from 20 in-depth interviews with developers of AI-based CP tools. Interviews were transcribed and inductive, thematic analysis was performed to identify 4 key design priorities: 1) to account for context and ensure explainability for both patients and clinicians; 2) align tools with existing clinical workflows; 3) appropriately customize to relevant stakeholders for usability and acceptability; and 4) push the boundaries of innovation while aligning with established paradigms. Our findings highlight that developers view themselves as not merely technical architects but also ethical stewards, designing tools that are both acceptable by users and epistemically responsible (prioritizing objectivity and pushing clinical knowledge forward). We offer the following suggestions to help achieve this balance: documenting how design choices around customization are made, defining limits for customization choices, transparently conveying information about outputs, and investing in user training. Achieving these goals will require interdisciplinary collaboration between developers, clinicians, and ethicists.
Authors:Sanaz Motamedi, Viktoria Marcus, Griffin Pitts
Abstract:
Road traffic remains a leading cause of death worldwide, with pedestrians and other vulnerable road users accounting for over half of the 1.19 million annual fatalities, much of it due to human error. Level-5 automated driving systems (ADSs), capable of full self-driving without human oversight, have the potential to reduce these incidents. However, their effectiveness depends not only on automation performance but also on their ability to communicate intent and coordinate safely with pedestrians in the absence of traditional driver cues. Understanding how pedestrians interpret and respond to ADS behavior is therefore critical to the development of connected vehicle systems. This study extends the Theory of Planned Behavior (TPB) by incorporating four external factors (i.e. safety, trust, compatibility, and understanding) to model pedestrian decision-making in road-crossing scenarios involving level-5 ADSs. Using data from an online survey (n = 212), results show that perceived behavioral control, attitude, and social information significantly predict pedestrians' crossing intentions. External factors, particularly perceived safety and understanding, strongly influence these constructs. Findings provide actionable insights for designing external human-machine interfaces (eHMIs) and cooperative V2X communication strategies that support safe, transparent interactions between automated vehicles and pedestrians. This work contributes to the development of inclusive, human-centered connected mobility systems.
Authors:Yue Yang, Xue Xie, Xinkai Wang, Hui Zhang, Chiming Yu, Xiaoxian Xiong, Lifeng Zhu, Yuanyi Zheng, Jue Cen, Bruce Daniel, Fred Baik
Abstract:
Optical see-through augmented reality (OST-AR) systems like Microsoft HoloLens 2 hold promise for arm's distance guidance (e.g., surgery), but depth perception of the hologram and occlusion of real instruments remain challenging. We present an evaluation of how visualizing the target object with different transparencies and visualizing a tracked tool (virtual proxy vs. real tool vs. no tool tracking) affects depth perception and system usability. Ten participants performed two experiments on HoloLens 2. In Experiment 1, we compared high-transparency vs. low-transparency target rendering in a depth matching task at arm's length. In Experiment 2, participants performed a simulated surgical pinpoint task on a frontal bone target under six visualization conditions ($2 \times 3$: two target transparencies and three tool visualization modes: virtual tool hologram, real tool, or no tool tracking). We collected data on depth matching error, target localization error, system usability, task workload, and qualitative feedback. Results show that a more opaque target yields significantly lower depth estimation error than a highly transparent target at arm's distance. Moreover, showing the real tool (occluding the virtual target) led to the highest accuracy and usability with the lowest workload, while not tracking the tool yielded the worst performance and user ratings. However, making the target highly transparent, while allowing the real tool to remain visible, slightly impaired depth cues and did not improve usability. Our findings underscore that correct occlusion cues, rendering virtual content opaque and occluding it with real tools in real time, are critical for depth perception and precision in OST-AR. Designers of arm-distance AR systems should prioritize robust tool tracking and occlusion handling; if unavailable, cautiously use transparency to balance depth perception and tool visibility.
Authors:Gerile Aodeng, Guozheng Li, Yunshan Feng, Qiyang Chen, Yu Zhang, Chi Harold Liu
Abstract:
Insights in tabular data capture valuable patterns that help analysts understand critical information. Organizing related insights into visual data stories is crucial for in-depth analysis. However, constructing such stories is challenging because of the complexity of the inherent relations between extracted insights. Users face difficulty sifting through a vast number of discrete insights to integrate specific ones into a unified narrative that meets their analytical goals. Existing methods either heavily rely on user expertise, making the process inefficient, or employ automated approaches that cannot fully capture their evolving goals. In this paper, we introduce InReAcTable, a framework that enhances visual data story construction by establishing both structural and semantic connections between data insights. Each user interaction triggers the Acting module, which utilizes an insight graph for structural filtering to narrow the search space, followed by the Reasoning module using the retrieval-augmented generation method based on large language models for semantic filtering, ultimately providing insight recommendations aligned with the user's analytical intent. Based on the InReAcTable framework, we develop an interactive prototype system that guides users to construct visual data stories aligned with their analytical requirements. We conducted a case study and a user experiment to demonstrate the utility and effectiveness of the InReAcTable framework and the prototype system for interactively building visual data stories.
Authors:Jaewook Lee, Davin Win Kyi, Leejun Kim, Jenny Peng, Gagyeom Lim, Jeremy Zhengqi Huang, Dhruv Jain, Jon E. Froehlich
Abstract:
Augmented reality (AR) has shown promise for supporting Deaf and hard-of-hearing (DHH) individuals by captioning speech and visualizing environmental sounds, yet existing systems do not allow users to create personalized sound visualizations. We present SonoCraftAR, a proof-of-concept prototype that empowers DHH users to author custom sound-reactive AR interfaces using typed natural language input. SonoCraftAR integrates real-time audio signal processing with a multi-agent LLM pipeline that procedurally generates animated 2D interfaces via a vector graphics library. The system extracts the dominant frequency of incoming audio and maps it to visual properties such as size and color, making the visualizations respond dynamically to sound. This early exploration demonstrates the feasibility of open-ended sound-reactive AR interface authoring and discusses future opportunities for personalized, AI-assisted tools to improve sound accessibility.
Authors:Hsuan-Kung Yang, Tsu-Ching Hsiao, Ryoichiro Oka, Ryuya Nishino, Satoko Tofukuji, Norimasa Kobori
Abstract:
Delivering intelligent and adaptive navigation assistance in augmented reality (AR) requires more than visual cues, as it demands systems capable of interpreting flexible user intent and reasoning over both spatial and semantic context. Prior AR navigation systems often rely on rigid input schemes or predefined commands, which limit the utility of rich building data and hinder natural interaction. In this work, we propose an embodied AR navigation system that integrates Building Information Modeling (BIM) with a multi-agent retrieval-augmented generation (RAG) framework to support flexible, language-driven goal retrieval and route planning. The system orchestrates three language agents, Triage, Search, and Response, built on large language models (LLMs), which enables robust interpretation of open-ended queries and spatial reasoning using BIM data. Navigation guidance is delivered through an embodied AR agent, equipped with voice interaction and locomotion, to enhance user experience. A real-world user study yields a System Usability Scale (SUS) score of 80.5, indicating excellent usability, and comparative evaluations show that the embodied interface can significantly improves users' perception of system intelligence. These results underscore the importance and potential of language-grounded reasoning and embodiment in the design of user-centered AR navigation systems.
Authors:Haojun Zhuang, Dünya Baradari, Nataliya Kosmyna, Arnav Balyan, Constanze Albrecht, Stephanie Chen, Pattie Maes
Abstract:
Humans regularly navigate an overwhelming amount of information via text media, whether reading articles, browsing social media, or interacting with chatbots. Confusion naturally arises when new information conflicts with or exceeds a reader's comprehension or prior knowledge, posing a challenge for learning. In this study, we present a multimodal investigation of reading-induced confusion using EEG and eye tracking. We collected neural and gaze data from 11 adult participants as they read short paragraphs sampled from diverse, real-world sources. By isolating the N400 event-related potential (ERP), a well-established neural marker of semantic incongruence, and integrating behavioral markers from eye tracking, we provide a detailed analysis of the neural and behavioral correlates of confusion during naturalistic reading. Using machine learning, we show that multimodal (EEG + eye tracking) models improve classification accuracy by 4-22% over unimodal baselines, reaching an average weighted participant accuracy of 77.3% and a best accuracy of 89.6%. Our results highlight the dominance of the brain's temporal regions in these neural signatures of confusion, suggesting avenues for wearable, low-electrode brain-computer interfaces (BCI) for real-time monitoring. These findings lay the foundation for developing adaptive systems that dynamically detect and respond to user confusion, with potential applications in personalized learning, human-computer interaction, and accessibility.
Authors:Henrik Voigt, Yurina Sugamiya, Kai Lawonn, Sina ZarrieÃ, Atsuo Takanishi
Abstract:
Objective Structured Clinical Examinations (OSCEs) are essential for medical training, but they require significant resources, including professional actors and expert medical feedback. Although Large Language Models (LLMs) have introduced text-based virtual patients for communication practice, these simulations often lack the capability for richer, non-textual interactions. This paper presents a novel framework that significantly enhances LLM-based simulated patients by equipping them with action spaces, thereby enabling more realistic and dynamic patient behaviors that extend beyond text. Furthermore, our system incorporates virtual tutors that provide students with instant, personalized feedback on their performance at any time during these simulated encounters. We have conducted a rigorous evaluation of the framework's real-time performance, including system latency and component accuracy. Preliminary evaluations with medical experts assessed the naturalness and coherence of the simulated patients, as well as the usefulness and appropriateness of the virtual tutor's assessments. This innovative system provides medical students with a low-cost, accessible platform for personalized OSCE preparation at home.
Authors:Xuetong Wang, Ching Christie Pang, Pan Hui
Abstract:
Human-AI romantic relationships have gained wide popularity among social media users in China. The technological impact on romantic relationships and its potential applications have long drawn research attention to topics such as relationship preservation and negativity mitigation. Media and communication studies also explore the practices in romantic para-social relationships. Nonetheless, this emerging human-AI romantic relationship, whether the relations fall into the category of para-social relationship together with its navigation pattern, remains unexplored, particularly in the context of relational stages and emotional attachment. This research thus seeks to fill this gap by presenting a mixed-method approach on 1,766 posts and 60,925 comments from Xiaohongshu, as well as the semi-structured interviews with 23 participants, of whom one of them developed her relationship with self-created AI for three years. The findings revealed that the users' willingness to self-disclose to AI companions led to increased positivity without social stigma. The results also unveiled the reciprocal nature of these interactions, the dominance of 'self', and raised concerns about language misuse, bias, and data security in AI communication.
Authors:Tyler Schroder, Renee Sirbu, Sohee Park, Jessica Morley, Sam Street, Luciano Floridi
Abstract:
Brain-computer interfaces (BCIs) show enormous potential for advancing personalized medicine. However, BCIs also introduce new avenues for cyber-attacks or security compromises. In this article, we analyze the problem and make recommendations for device manufacturers to better secure devices and to help regulators understand where more guidance is needed to protect patient safety and data confidentiality. Device manufacturers should implement the prior suggestions in their BCI products. These recommendations help protect BCI users from undue risks, including compromised personal health and genetic information, unintended BCI-mediated movement, and many other cybersecurity breaches. Regulators should mandate non-surgical device update methods, strong authentication and authorization schemes for BCI software modifications, encryption of data moving to and from the brain, and minimize network connectivity where possible. We also design a hypothetical, average-case threat model that identifies possible cybersecurity threats to BCI patients and predicts the likeliness of risk for each category of threat. BCIs are at less risk of physical compromise or attack, but are vulnerable to remote attack; we focus on possible threats via network paths to BCIs and suggest technical controls to limit network connections.
Authors:Danny Scott, William LaForest, Hritom Das, Ioannis Polykretis, Catherine D. Schuman, Charles Rizzo, James Plank, Sai Swaminathan
Abstract:
The deployment of dense, low-cost sensors is critical for realizing ubiquitous smart environments. However, existing sensing solutions struggle with the energy, scalability, and reliability trade-offs imposed by battery maintenance, wireless transmission overhead, and data processing complexity. In this work, we present Vibe2Spike, a novel battery-free, wireless sensing framework that enables vibration-based activity recognition using visible light communication (VLC) and spiking neural networks (SNNs). Our system uses ultra-low-cost tags composed only of a piezoelectric disc, a Zener diode, and an LED, which harvest vibration energy and emit sparse visible light spikes without requiring batteries or RF radios. These optical spikes are captured by event cameras and classified using optimized SNN models evolved via the EONS framework. We evaluate Vibe2Spike across five device classes, achieving 94.9\% average classification fitness while analyzing the latency-accuracy trade-offs of different temporal binning strategies. Vibe2Spike demonstrates a scalable, and energy-efficient approach for enabling intelligent environments in a batteryless manner.
Authors:Andrea Gargano, Jasin Machkour, Mimma Nardelli, Enzo Pasquale Scilingo, Michael Muma
Abstract:
In Affective Computing, a key challenge lies in reliably linking subjective emotional experiences with objective physiological markers. This preliminary study addresses the issue of reproducibility by identifying physiological features from cardiovascular and electrodermal signals that are associated with continuous self-reports of arousal levels. Using the Continuously Annotated Signal of Emotion dataset, we analyzed 164 features extracted from cardiac and electrodermal signals of 30 participants exposed to short emotion-evoking videos. Feature selection was performed using the Terminating-Random Experiments (T-Rex) method, which performs variable selection systematically controlling a user-defined target False Discovery Rate. Remarkably, among all candidate features, only two electrodermal-derived features exhibited reproducible and statistically significant associations with arousal, achieving a 100\% confirmation rate. These results highlight the necessity of rigorous reproducibility assessments in physiological features selection, an aspect often overlooked in Affective Computing. Our approach is particularly promising for applications in safety-critical environments requiring trustworthy and reliable white box models, such as mental disorder recognition and human-robot interaction systems.
Authors:Chao-Jung Lai, Mauricio Sousa, Tianyu Zhang, Ludwig Sidenmark, Tovi Grossman
Abstract:
Selection is a fundamental task that is challenging in virtual reality due to issues such as distant and small targets, occlusion, and target-dense environments. Previous research has tackled these challenges through various selection techniques, but complicates selection and can be seen as tedious outside of their designed use case. We present Adaptique, an adaptive model that infers and switches to the most optimal selection technique based on user and environmental information. Adaptique considers contextual information such as target size, distance, occlusion, and user posture combined with four objectives: speed, accuracy, comfort, and familiarity which are based on fundamental predictive models of human movement for technique selection. This enables Adaptique to select simple techniques when they are sufficiently efficient and more advanced techniques when necessary. We show that Adaptique is more preferred and performant than single techniques in a user study, and demonstrate Adaptique's versatility in an application.
Authors:Mohammed Alsobay, David M. Rothschild, Jake M. Hofman, Daniel G. Goldstein
Abstract:
Group decision-making often suffers from uneven information sharing, hindering decision quality. While large language models (LLMs) have been widely studied as aids for individuals, their potential to support groups of users, potentially as facilitators, is relatively underexplored. We present a pre-registered randomized experiment with 1,475 participants assigned to 281 five-person groups completing a hidden profile task--selecting an optimal city for a hypothetical sporting event--under one of four facilitation conditions: no facilitation, a one-time message prompting information sharing, a human facilitator, or an LLM (GPT-4o) facilitator. We find that LLM facilitation increases information shared within a discussion by raising the minimum level of engagement with the task among group members, and that these gains come at limited cost in terms of participants' attitudes towards the task, their group, or their facilitator. Whether by human or AI, there is no significant effect of facilitation on the final decision outcome, suggesting that even substantial but partial increases in information sharing are insufficient to overcome the hidden profile effect studied. To support further research into how LLM-based interfaces can support the future of collaborative decision making, we release our experimental platform, the Group-AI Interaction Laboratory (GRAIL), as an open-source tool.
Authors:Laura Spillner, Rachel Ringe, Robert Porzel, Rainer Malaka
Abstract:
In the context of AI-based decision support systems, explanations can help users to judge when to trust the AI's suggestion, and when to question it. In this way, human oversight can prevent AI errors and biased decision-making. However, this rests on the assumption that users will consider explanations in enough detail to be able to catch such errors. We conducted an online study on trust in explainable DSS, and were surprised to find that in many cases, participants spent little time on the explanation and did not always consider it in detail. We present an exploratory analysis of this data, investigating what factors impact how carefully study participants consider AI explanations, and how this in turn impacts whether they are open to changing their mind based on what the AI suggests.
Authors:Mingyang Su, Chao Liu, Jingling Zhang, WU Shuang, Mingming Fan
Abstract:
Offering diverse perspectives on a museum artifact can deepen visitors' understanding and help avoid the cognitive limitations of a single narrative, ultimately enhancing their overall experience. Physical museums promote diversity through visitor interactions. However, it remains a challenge to present multiple voices appropriately while attracting and sustaining a visitor's attention in the virtual museum. Inspired by recent studies that show the effectiveness of LLM-powered multi-agents in presenting different opinions about an event, we propose SimViews, an interactive multi-agent system that simulates visitor-to-visitor conversational patterns to promote the presentation of diverse perspectives. The system employs LLM-powered multi-agents that simulate virtual visitors with different professional identities, providing diverse interpretations of artifacts. Additionally, we constructed 4 conversational patterns between users and agents to simulate visitor interactions. We conducted a within-subject study with 20 participants, comparing SimViews to a traditional single-agent condition. Our results show that SimViews effectively facilitates the presentation of diverse perspectives through conversations, enhancing participants' understanding of viewpoints and engagement within the virtual museum.
Authors:Sefat E. Rahman, Tushar M. Athawale, Paul Rosen
Abstract:
Reeb graphs are an important tool for abstracting and representing the topological structure of a function defined on a manifold. We have identified three properties for faithfully representing Reeb graphs in a visualization: they should be constrained to the boundary, compact, and aligned with the function gradient. Existing algorithms for drawing Reeb graphs are agnostic to or violate these properties. In this paper, we introduce an algorithm to generate Reeb graph visualizations, called GASP, that is cognizant of these properties, thereby producing visualizations that are more representative of the underlying data. To demonstrate the improvements, the resulting Reeb graphs are evaluated both qualitatively and quantitatively against the geometric barycenter algorithm, using its implementation available in the Topology ToolKit (TTK), a widely adopted tool for calculating and visualizing Reeb graphs.
Authors:Federico Scarì, Olger Siebinga, Arkady Zgonnikov
Abstract:
As automated vehicles (AVs) increasingly integrate into mixed-traffic environments, evaluating their interaction with human-driven vehicles (HDVs) becomes critical. In most research focused on developing new AV control algorithms (controllers), the performance of these algorithms is assessed solely based on performance metrics such as collision avoidance or lane-keeping efficiency, while largely overlooking the human-centred dimensions of interaction with HDVs. This paper proposes a structured evaluation framework that addresses this gap by incorporating metrics grounded in the human-robot interaction literature. The framework spans four key domains: a) interaction effect, b) interaction perception, c) interaction effort, and d) interaction ability. These domains capture both the performance of the AV and its impact on human drivers around it. To demonstrate the utility of the framework, we apply it to a case study evaluating how a state-of-the-art AV controller interacts with human drivers in a merging scenario in a driving simulator. Measuring HDV-HDV interactions as a baseline, this study included one representative metric per domain: a) perceived safety, b) subjective ratings, specifically how participants perceived the other vehicle's driving behaviour (e.g., aggressiveness or predictability) , c) driver workload, and d) merging success. The results showed that incorporating metrics covering all four domains in the evaluation of AV controllers can illuminate critical differences in driver experience when interacting with AVs. This highlights the need for a more comprehensive evaluation approach. Our framework offers researchers, developers, and policymakers a systematic method for assessing AV behaviour beyond technical performance, fostering the development of AVs that are not only functionally capable but also understandable, acceptable, and safe from a human perspective.
Authors:Lyria Team, Antoine Caillon, Brian McWilliams, Cassie Tarakajian, Ian Simon, Ilaria Manco, Jesse Engel, Noah Constant, Yunpeng Li, Timo I. Denk, Alberto Lalama, Andrea Agostinelli, Cheng-Zhi Anna Huang, Ethan Manilow, George Brower, Hakan Erdogan, Heidi Lei, Itai Rolnick, Ivan Grishchenko, Manu Orsini, Matej Kastelic, Mauricio Zuluaga, Mauro Verzetti, Michael Dooley, Ondrej Skopek, Rafael Ferrer, Zalán Borsos, Ãaron van den Oord, Douglas Eck, Eli Collins, Jason Baldridge, Tom Hume, Chris Donahue, Kehang Han, Adam Roberts
Abstract:
We introduce a new class of generative models for music called live music models that produce a continuous stream of music in real-time with synchronized user control. We release Magenta RealTime, an open-weights live music model that can be steered using text or audio prompts to control acoustic style. On automatic metrics of music quality, Magenta RealTime outperforms other open-weights music generation models, despite using fewer parameters and offering first-of-its-kind live generation capabilities. We also release Lyria RealTime, an API-based model with extended controls, offering access to our most powerful model with wide prompt coverage. These models demonstrate a new paradigm for AI-assisted music creation that emphasizes human-in-the-loop interaction for live music performance.
Authors:Ze Gao, Mengyao Guo, Zheng Wang, Xiaolin Zhang, Sihuang Man
Abstract:
Digital ecological art represents an emergent frontier where biological media converge with virtual environments. This study examines the paradigm shift from anthropocentric to plant-centered artistic narratives within the metaverse, contextualizing how digital platforms transform ecological expression. However, current frameworks fail to systematically guide artists in leveraging plant agency for digital symbiosis that transcends human-centered creation. We propose the Biocentric-Creation Transformation Ideology (BCTI) framework and validate it through multimodal case studies spanning bio-art, NFTs, and VR ecosystems (2013-2023). Our analysis reveals: (1) Metaverse ecosystems enable unprecedented plant-algorithm co-creation, with biological artworks increasing by 133% in premier archives (2020 vs 2013); (2) Digital symbiosis manifests through blockchain DAOs where plants govern human-plant collaborations; (3) Algorithmic photosynthesis in VR environments reshapes ecological aesthetics through real-time biodata translation. The BCTI framework advances ecological art theory by systematizing the transition from representation to plant-centered agency, offering artists a blueprint for post-anthropocene creation. This redefines environmental consciousness in virtual realms while establishing new protocols for cross-species digital collaboration.
Authors:Zach Cutler, Jack Wilburn, Hilson Shrestha, Yiren Ding, Brian Bollen, Khandaker Abrar Nadib, Tingying He, Andrew McNutt, Lane Harrison, Alexander Lex
Abstract:
Online user studies of visualizations, visual encodings, and interaction techniques are ubiquitous in visualization research. Yet, designing, conducting, and analyzing studies effectively is still a major burden. Although various packages support such user studies, most solutions address only facets of the experiment life cycle, make reproducibility difficult, or do not cater to nuanced study designs or interactions. We introduce reVISit 2, a software framework that supports visualization researchers at all stages of designing and conducting browser-based user studies. ReVISit supports researchers in the design, debug & pilot, data collection, analysis, and dissemination experiment phases by providing both technical affordances (such as replay of participant interactions) and sociotechnical aids (such as a mindfully maintained community of support). It is a proven system that can be (and has been) used in publication-quality studies -- which we demonstrate through a series of experimental replications. We reflect on the design of the system via interviews and an analysis of its technical dimensions. Through this work, we seek to elevate the ease with which studies are conducted, improve the reproducibility of studies within our community, and support the construction of advanced interactive studies.
Authors:Pavlos Panagiotidis, Victor Zhi Heung Ngo, Sean Myatt, Roma Patel, Rachel Ramchurn, Alan Chamberlain, Ayse Kucukyilmaz
Abstract:
In this paper, we propose theatre-in-the-loop, a framework for developing expressive robot behaviours tailored to artistic performance through a director-guided puppeteering workflow. Leveraging theatrical methods, we use narrative objectives to direct a puppeteer in generating improvised robotic gestures that convey specific emotions. These improvisations are captured and curated to build a dataset of reusable movement templates for standalone playback in future autonomous performances. Initial trials demonstrate the feasibility of this approach, illustrating how the workflow enables precise sculpting of robotic gestures into coherent emotional arcs while revealing challenges posed by the robot's mechanical constraints. We argue that this practice-led framework provides a model for interdisciplinary teams creating socially expressive robot behaviours, contributing to (1) theatre as an interactive training ground for human-robot interaction and (2) co-creation methodologies between humans and machines.
Authors:Pavlo Bazilinskyy, Francesco Walker, Debargha Dey, Tram Thi Minh Tran, Hyungchai Park, Hyochang Kim, Hyunmin Kang, Patrick Ebel
Abstract:
The transition to mixed-traffic environments that involve automated vehicles, manually operated vehicles, and vulnerable road users presents new challenges for human-centered automotive research. Despite this, most studies in the domain focus on single-agent interactions. This paper reports on a participatory workshop (N = 15) and a questionnaire (N = 19) conducted during the AutomotiveUI '24 conference to explore the state of multi-agent automotive research. The participants discussed methodological challenges and opportunities in real-world settings, simulations, and computational modeling. Key findings reveal that while the value of multi-agent approaches is widely recognized, practical and technical barriers hinder their implementation. The study highlights the need for interdisciplinary methods, better tools, and simulation environments that support scalable, realistic, and ethically informed multi-agent research.
Authors:Daniel Killough, Justin Feng, Zheng Xue "ZX" Ching, Daniel Wang, Rithvik Dyava, Yapeng Tian, Yuhang Zhao
Abstract:
Virtual Reality (VR) is inaccessible to blind people. While research has investigated many techniques to enhance VR accessibility, they require additional developer effort to integrate. As such, most mainstream VR apps remain inaccessible as the industry de-prioritizes accessibility. We present VRSight, an end-to-end system that recognizes VR scenes post hoc through a set of AI models (e.g., object detection, depth estimation, LLM-based atmosphere interpretation) and generates tone-based, spatial audio feedback, empowering blind users to interact in VR without developer intervention. To enable virtual element detection, we further contribute DISCOVR, a VR dataset consisting of 30 virtual object classes from 17 social VR apps, substituting real-world datasets that remain not applicable to VR contexts. Nine participants used VRSight to explore an off-the-shelf VR app (Rec Room), demonstrating its effectiveness in facilitating social tasks like avatar awareness and available seat identification.
Authors:Amine Allouah, Omar Besbes, Josué D Figueroa, Yash Kanoria, Akshit Kumar
Abstract:
Online marketplaces will be transformed by autonomous AI agents acting on behalf of consumers. Rather than humans browsing and clicking, vision-language-model (VLM) agents can parse webpages, evaluate products, and transact. This raises a fundamental question: what do AI agents buy, and why? We develop ACES, a sandbox environment that pairs a platform-agnostic VLM agent with a fully programmable mock marketplace to study this question. We first conduct basic rationality checks in the context of simple tasks, and then, by randomizing product positions, prices, ratings, reviews, sponsored tags, and platform endorsements, we obtain causal estimates of how frontier VLMs actually shop. Models show strong but heterogeneous position effects: all favor the top row, yet different models prefer different columns, undermining the assumption of a universal "top" rank. They penalize sponsored tags and reward endorsements. Sensitivities to price, ratings, and reviews are directionally human-like but vary sharply in magnitude across models. Motivated by scenarios where sellers use AI agents to optimize product listings, we show that a seller-side agent that makes minor tweaks to product descriptions, targeting AI buyer preferences, can deliver substantial market-share gains if AI-mediated shopping dominates. We also find that modal product choices can differ across models and, in some cases, demand may concentrate on a few select products, raising competition questions. Together, our results illuminate how AI agents may behave in e-commerce settings and surface concrete seller strategy, platform design, and regulatory questions in an AI-mediated ecosystem.
Authors:Kristin M. Kostick-Quenet, Meghan E. Hurley, Syed Ayaz, John Herrington, Casey Zampella, Julia Parish-Morris, Birkan Tunç, Gabriel Lázaro-Muñoz, J. S. Blumenthal-Barby, Eric A. Storch
Abstract:
Computer perception (CP) technologies (digital phenotyping, affective computing and related passive sensing approaches) offer unprecedented opportunities to personalize healthcare, but provoke concerns about privacy, bias and the erosion of empathic, relationship-centered practice. A comprehensive understanding of perceived risks, benefits, and implementation challenges from those who design, deploy and experience these tools in real-world settings remains elusive. This study provides the first evidence-based account of key stakeholder perspectives on the relational, technical, and governance challenges raised by the integration of CP technologies into patient care. We conducted in-depth, semi-structured interviews with 102 stakeholders: adolescent patients and their caregivers, frontline clinicians, technology developers, and ethics, legal, policy or philosophy scholars. Transcripts underwent thematic analysis by a multidisciplinary team; reliability was enhanced through double coding and consensus adjudication. Stakeholders articulated seven interlocking concern domains: (1) trustworthiness and data integrity; (2) patient-specific relevance; (3) utility and workflow integration; (4) regulation and governance; (5) privacy and data protection; (6) direct and indirect patient harms; and (7) philosophical critiques of reductionism. To operationalize humanistic safeguards, we propose "personalized roadmaps": co-designed plans that predetermine which metrics will be monitored, how and when feedback is shared, thresholds for clinical action, and procedures for reconciling discrepancies between algorithmic inferences and lived experience. By translating these insights into personalized roadmaps, we offer a practical framework for developers, clinicians and policymakers seeking to harness continuous behavioral data while preserving the humanistic core of care.
Authors:Yitong Zhu, Lei Han, Guanxuan Jiang, PengYuan Zhou, Yuyang Wang
Abstract:
Multimodal emotion recognition (MER) is crucial for human-computer interaction, yet real-world challenges like dynamic modality incompleteness and asynchrony severely limit its robustness. Existing methods often assume consistently complete data or lack dynamic adaptability. To address these limitations, we propose a novel Hi-MoE~(Hierarchical Mixture-of-Experts) framework for robust continuous emotion prediction. This framework employs a dual-layer expert structure. A Modality Expert Bank utilizes soft routing to dynamically handle missing modalities and achieve robust information fusion. A subsequent Emotion Expert Bank leverages differential-attention routing to flexibly attend to emotional prototypes, enabling fine-grained emotion representation. Additionally, a cross-modal alignment module explicitly addresses temporal shifts and semantic inconsistencies between modalities. Extensive experiments on benchmark datasets DEAP and DREAMER demonstrate our model's state-of-the-art performance in continuous emotion regression, showcasing exceptional robustness under challenging conditions such as dynamic modality absence and asynchronous sampling. This research significantly advances the development of intelligent emotion systems adaptable to complex real-world environments.
Authors:Yingfan Zhou, Ester Chen, Manasa Pisipati, Aiping Xiong, Sarah Rajtmajer
Abstract:
Synthetic images, audio, and video can now be generated and edited by Artificial Intelligence (AI). In particular, the malicious use of synthetic data has raised concerns about potential harms to cybersecurity, personal privacy, and public trust. Although AI-based detection tools exist to help identify synthetic content, their limitations often lead to user mistrust and confusion between real and fake content. This study examines the role of AI performance in influencing human trust and decision making in synthetic data identification. Through an online human subject experiment involving 400 participants, we examined how varying AI performance impacts human trust and dependence on AI in deepfake detection. Our findings indicate how participants calibrate their dependence on AI based on their perceived risk and the prediction results provided by AI. These insights contribute to the development of transparent and explainable AI systems that better support everyday users in mitigating the harms of synthetic media.
Authors:Xinwu Ye, Jun-Hsiang Yao, Jielin Feng, Shuhong Mei, Xingyu Lan, Siming Chen
Abstract:
With captivating visual effects, stylized 3D character animation has gained widespread use in cinematic production, advertising, social media, and the potential development of virtual reality (VR) non-player characters (NPCs). However, animating stylized 3D characters often requires significant time and effort from animators. We propose a mixed-initiative framework and interactive system to enable stylized 3D characters to mimic motion in human videos. The framework takes a single-view human video and a stylized 3D character (the target character) as input, captures the motion of the video, and then transfers the motion to the target character. In addition, it involves two interaction modules for customizing the result. Accordingly, the system incorporates two authoring tools that empower users with intuitive modification. A questionnaire study offers tangible evidence of the framework's capability of generating natural stylized 3D character animations similar to the motion in the video. Additionally, three case studies demonstrate the utility of our approach in creating diverse results.
Authors:Sameer Neupane, Mithun Saha, David M. Almeida, Santosh Kumar
Abstract:
Understanding how frequently people experience different kinds of daily stressors is crucial for interpreting stress exposure and informing mental health care. But it can't be directly estimated from current assessment methods, such as diaries, end-of-day interviews, and ecological momentary assessments (EMA), that use sparse sampling to limit participant burden, and a structured response format for uniformity. In this paper, we utilize stressor data collected in a 100-day field study with 68 participants that adopted wearable-triggered prompts and a freeform format to solicit stressors soon after they occurred, but limited its prompts to a small subset to keep the burden low. We develop asymptotic models to estimate the latent frequency of different kinds of real-life stressors that address sample sparsity and sampling bias. We find that people experience 5.39 stressors per day, on average. The top three are related to work (1.76/day), health (0.59/day), and transportation (0.55/day). These estimates offer a principled benchmark for interpreting individual stressor loads. They can also inform mental health care treatments and interventions by establishing population-level baselines.
Authors:Emilio Barkett, Olivia Long, Paul Kröger
Abstract:
Large Language Models (LLMs) are increasingly deployed in autonomous decision-making roles across high-stakes domains. However, since models are trained on human-generated data, they may inherit cognitive biases that systematically distort human judgment, including escalation of commitment, where decision-makers continue investing in failing courses of action due to prior investment. Understanding when LLMs exhibit such biases presents a unique challenge. While these biases are well-documented in humans, it remains unclear whether they manifest consistently in LLMs or require specific triggering conditions. This paper investigates this question using a two-stage investment task across four experimental conditions: model as investor, model as advisor, multi-agent deliberation, and compound pressure scenario. Across N = 6,500 trials, we find that bias manifestation in LLMs is highly context-dependent. In individual decision-making contexts (Studies 1-2, N = 4,000), LLMs demonstrate strong rational cost-benefit logic with minimal escalation of commitment. However, multi-agent deliberation reveals a striking hierarchy effect (Study 3, N = 500): while asymmetrical hierarchies show moderate escalation rates (46.2%), symmetrical peer-based decision-making produces near-universal escalation (99.2%). Similarly, when subjected to compound organizational and personal pressures (Study 4, N = 2,000), models exhibit high degrees of escalation of commitment (68.95% average allocation to failing divisions). These findings reveal that LLM bias manifestation depends critically on social and organizational context rather than being inherent, with significant implications for the deployment of multi-agent systems and unsupervised operations where such conditions may emerge naturally.
Authors:Shengqi Zhu, Jeffrey M. Rzeszotarski, David Mimno
Abstract:
Chat logs provide a rich source of information about LLM users, but patterns of user behavior are often masked by the variability of queries. We present a new task, segmenting chat queries into contents of requests, roles, query-specific context, and additional expressions. We find that, despite the familiarity of chat-based interaction, request-making in LLM queries remains significantly different from comparable human-human interactions. With the data resource, we introduce an important perspective of diachronic analyses with user expressions. We find that query patterns vary between early ones emphasizing requests, and individual users explore patterns but tend to converge with experience. Finally, we show that model capabilities affect user behavior, particularly with the introduction of new models, which are traceable at the community level.
Authors:Yuma Akiba, Shota Nakayama, Keigo Ushiyama, Izumi Mizoguchi, Hiroyuki Kajimoto
Abstract:
This study proposes a method to present pure low-frequency vibration sensations to the face that cannot be presented by small commercially available vibrators. The core innovation lies in utilizing an amplitude modulation technique with a carrier frequency of approximately 200 Hz. Due to the absence of Pacinian corpuscles in the facial region - receptors responsible for detecting high-frequency vibrations around 200 Hz - only the original low-frequency signal is perceived. Three experiments were conducted. Experiments 1 and 2 were performed on the forehead to confirm that the proposed amplitude modulation method could produce the desired low-frequency perception and to evaluate the subjective quality of the vibration. The results suggested that the proposed method could produce the perception of desired pure low-frequency vibration when applied to the forehead. In Experiment 3, the proposed method was applied to the whole face, and its range of applicability was explored. The results indicated that the original low-frequency vibration was clearly perceptible around the eyes, cheeks, and lower lip area.
Authors:Shruthi Chari, Oshani Seneviratne, Prithwish Chakraborty, Pablo Meyer, Deborah L. McGuinness
Abstract:
Explanations are crucial for building trustworthy AI systems, but a gap often exists between the explanations provided by models and those needed by users. To address this gap, we introduce MetaExplainer, a neuro-symbolic framework designed to generate user-centered explanations. Our approach employs a three-stage process: first, we decompose user questions into machine-readable formats using state-of-the-art large language models (LLM); second, we delegate the task of generating system recommendations to model explainer methods; and finally, we synthesize natural language explanations that summarize the explainer outputs. Throughout this process, we utilize an Explanation Ontology to guide the language models and explainer methods. By leveraging LLMs and a structured approach to explanation generation, MetaExplainer aims to enhance the interpretability and trustworthiness of AI systems across various applications, providing users with tailored, question-driven explanations that better meet their needs. Comprehensive evaluations of MetaExplainer demonstrate a step towards evaluating and utilizing current state-of-the-art explanation frameworks. Our results show high performance across all stages, with a 59.06% F1-score in question reframing, 70% faithfulness in model explanations, and 67% context-utilization in natural language synthesis. User studies corroborate these findings, highlighting the creativity and comprehensiveness of generated explanations. Tested on the Diabetes (PIMA Indian) tabular dataset, MetaExplainer supports diverse explanation types, including Contrastive, Counterfactual, Rationale, Case-Based, and Data explanations. The framework's versatility and traceability from using ontology to guide LLMs suggest broad applicability beyond the tested scenarios, positioning MetaExplainer as a promising tool for enhancing AI explainability across various domains.
Authors:Wataru Kawabe, Hiroto Fukuda, Akihisa Shitara, Yuri Nakao, Yusuke Sugano
Abstract:
We introduce TofuML, an interactive system designed to make machine learning (ML) concepts more accessible and engaging for non-expert users. Unlike conventional GUI-based systems, TofuML employs a physical and spatial interface consisting of a small device and a paper mat, allowing users to train and evaluate sound classification models through intuitive, toy-like interactions. Through two user studies -- a comparative study against a GUI-based version and a public event deployment -- we investigated how TofuML impacts users' engagement in the ML model creation process, their ability to provide appropriate training data, and their conception of potential applications. Our results indicated that TofuML enhanced user engagement compared to a GUI while lowering barriers for non-experts to engage with ML. Users demonstrated creativity in conceiving diverse ML applications, revealing opportunities to optimize between conceptual understanding and user engagement. These findings contribute to developing interactive ML systems/frameworks designed for a wide range of users.
Authors:Zhanna Kaufman, Madeline Endres, Cindy Xiong Bearfield, Yuriy Brun
Abstract:
Systems relying on ML have become ubiquitous, but so has biased behavior within them. Research shows that bias significantly affects stakeholders' trust in systems and how they use them. Further, stakeholders of different backgrounds view and trust the same systems differently. Thus, how ML models' behavior is explained plays a key role in comprehension and trust. We survey explainability visualizations, creating a taxonomy of design characteristics. We conduct user studies to evaluate five state-of-the-art visualization tools (LIME, SHAP, CP, Anchors, and ELI5) for model explainability, measuring how taxonomy characteristics affect comprehension, bias perception, and trust for non-expert ML users. Surprisingly, we find an inverse relationship between comprehension and trust: the better users understand the models, the less they trust them. We investigate the cause and find that this relationship is strongly mediated by bias perception: more comprehensible visualizations increase people's perception of bias, and increased bias perception reduces trust. We confirm this relationship is causal: Manipulating explainability visualizations to control comprehension, bias perception, and trust, we show that visualization design can significantly (p < 0.001) increase comprehension, increase perceived bias, and reduce trust. Conversely, reducing perceived model bias, either by improving model fairness or by adjusting visualization design, significantly increases trust even when comprehension remains high. Our work advances understanding of how comprehension affects trust and systematically investigates visualization's role in facilitating responsible ML applications.
Authors:William Huang, Xia Su, Jon E. Froehlich, Yang Zhang
Abstract:
Assessing the accessibility of unfamiliar built environments is critical for people with disabilities. However, manual assessments, performed by users or their personal health professionals, are laborious and unscalable, while automatic machine learning methods often neglect an individual user's unique needs. Recent advances in Large Language Models (LLMs) enable novel approaches to this problem, balancing personalization with scalability to enable more adaptive and context-aware assessments of accessibility. We present Accessibility Scout, an LLM-based accessibility scanning system that identifies accessibility concerns from photos of built environments. With use, Accessibility Scout becomes an increasingly capable "accessibility scout", tailoring accessibility scans to an individual's mobility level, preferences, and specific environmental interests through collaborative Human-AI assessments. We present findings from three studies: a formative study with six participants to inform the design of Accessibility Scout, a technical evaluation of 500 images of built environments, and a user study with 10 participants of varying mobility. Results from our technical evaluation and user study show that Accessibility Scout can generate personalized accessibility scans that extend beyond traditional ADA considerations. Finally, we conclude with a discussion on the implications of our work and future steps for building more scalable and personalized accessibility assessments of the physical world.
Authors:Shingo Hattori, Takefumi Hiraki
Abstract:
Large, high-resolution displays are installed throughout the city as public displays. By superimposing invisible information on the images of these displays, large numbers of devices with cameras and sensors can communicate with the displays without prior pairing. Several applications have been proposed, such as operating robots or communicating information to users by displaying 2D codes on images. However, the display of 2D codes has the problem of compromising the appearance of displayed content.
Abe et al. proposed a method of communicating with devices by superimposing invisible information using color vibration on images displayed on off-the-shelf liquid-crystal displays (LCD). Using this method, we can embed the information for devices in images without interfering with the displayed content. Abe et al. uses a simple serial loop operation to search for color pairs comprising a color vibration, which requires a very long processing time due to the huge search space.
In this paper, we propose an accelerated and optimized search method for color pairs that constitute the imperceptible color vibration for embedding information on LCD images. To achieve fast color pair search, we parallelized the search process, which is previously done individually, by using arrays representing the amount of movement and an operation to extract elements from the array that satisfy the conditions. In addition, we investigate the amount of information that can be superimposed on nine color images using the imperceptible color vibration and clarify the applicability of embedding information into images using the color vibration.
Authors:Kexin Liang, Simeon C. Calvert, J. W. C. van Lint
Abstract:
Conditionally automated driving systems require human drivers to disengage from non-driving-related activities and resume vehicle control within limited time budgets when encountering scenarios beyond system capabilities. Ensuring safe and comfortable transitions is critical for reducing driving risks and improving user experience. However, takeovers involve complex human-vehicle interactions, resulting in substantial variability in drivers' responses, especially in takeover time, defined as the duration needed to regain control. This variability presents challenges in setting sufficient time budgets that are neither too short (risking safety and comfort) nor too long (reducing driver alertness and transition efficiency).
Although previous research has examined the role of time budgets in influencing takeover time and performance, few studies have systematically addressed how to determine sufficient time budgets that adapt to diverse scenarios and driver needs. This review supports such efforts by examining the entire takeover sequence, including takeover time, time budget, and takeover performance. Specifically, we (i) synthesize causal factors influencing takeover time and propose a taxonomy of its determinants using the task-capability interface model; (ii) review existing work on fixed and adaptive time budgets, introducing the concept of the takeover buffer to describe the gap between takeover time and allocated time budget; (iii) present a second taxonomy to support standardized and context-sensitive measurement of takeover performance; (iv) propose a conceptual model describing the relationships among takeover time, time budget, and performance; and (v) outline a research agenda with six directions.
Authors:Kexin Liang, Jan Luca Kästle, Bani Anvari, Simeon C. Calvert, J. W. C. van Lint
Abstract:
When automated driving systems encounter complex situations beyond their operational capabilities, they issue takeover requests, prompting drivers to resume vehicle control and return to the driving loop as a critical safety backup. However, this control transition places significant demands on drivers, requiring them to promptly respond to takeover requests while executing high-quality interventions. To ensure safe and comfortable control transitions, it is essential to develop a deep understanding of the key factors influencing various takeover performance aspects. This study evaluates drivers' takeover performance across three dimensions: response efficiency, user experience, and driving safety - using a driving simulator experiment. EXtreme Gradient Boosting (XGBoost) models are used to investigate the contributions of two critical factors, i.e., Situational Awareness (SA) and Spare Capacity (SC), in predicting various takeover performance metrics by comparing the predictive results to the baseline models that rely solely on basic Driver Characteristics (DC). The results reveal that (i) higher SA enables drivers to respond to takeover requests more quickly, particularly for reflexive responses; and (ii) SC shows a greater overall impact on takeover quality than SA, where higher SC generally leads to enhanced subjective rating scores and objective execution trajectories. These findings highlight the distinct yet complementary roles of SA and SC in shaping performance components, offering valuable insights for optimizing human-vehicle interactions and enhancing automated driving system design.
Authors:Yi Kong, Dianxi Shi, Guoli Yang, Zhang ke-di, Chenlin Huang, Xiaopeng Li, Songchang Jin
Abstract:
The recent advancement of autonomous agents powered by Large Language Models (LLMs) has demonstrated significant potential for automating tasks on mobile devices through graphical user interfaces (GUIs). Despite initial progress, these agents still face challenges when handling complex real-world tasks. These challenges arise from a lack of knowledge about real-life mobile applications in LLM-based agents, which may lead to ineffective task planning and even cause hallucinations. To address these challenges, we propose a novel LLM-based agent framework called MapAgent that leverages memory constructed from historical trajectories to augment current task planning. Specifically, we first propose a trajectory-based memory mechanism that transforms task execution trajectories into a reusable and structured page-memory database. Each page within a trajectory is extracted as a compact yet comprehensive snapshot, capturing both its UI layout and functional context. Secondly, we introduce a coarse-to-fine task planning approach that retrieves relevant pages from the memory database based on similarity and injects them into the LLM planner to compensate for potential deficiencies in understanding real-world app scenarios, thereby achieving more informed and context-aware task planning. Finally, planned tasks are transformed into executable actions through a task executor supported by a dual-LLM architecture, ensuring effective tracking of task progress. Experimental results in real-world scenarios demonstrate that MapAgent achieves superior performance to existing methods. The code will be open-sourced to support further research.
Authors:Yotam Sechayk, Ariel Shamir, Amy Pavel, Takeo Igarashi
Abstract:
Instructors often rely on visual actions such as pointing, marking, and sketching to convey information in educational presentation videos. These subtle visual cues often lack verbal descriptions, forcing low-vision (LV) learners to search for visual indicators or rely solely on audio, which can lead to missed information and increased cognitive load. To address this challenge, we conducted a co-design study with three LV participants and developed VeasyGuide, a tool that uses motion detection to identify instructor actions and dynamically highlight and magnify them. VeasyGuide produces familiar visual highlights that convey spatial context and adapt to diverse learners and content through extensive personalization and real-time visual feedback. VeasyGuide reduces visual search effort by clarifying what to look for and where to look. In an evaluation with 8 LV participants, learners demonstrated a significant improvement in detecting instructor actions, with faster response times and significantly reduced cognitive load. A separate evaluation with 8 sighted participants showed that VeasyGuide also enhanced engagement and attentiveness, suggesting its potential as a universally beneficial tool.
Authors:Meryem Yilmaz Soylu, Jeonghyun Lee, Jui-Tse Hung, Christopher Zhang Cui, David A. Joyner
Abstract:
As Artificial Intelligence (AI) tools become increasingly embedded in higher education, understanding how students interact with these systems is essential to supporting effective learning. This study examines how students' AI literacy and prior exposure to AI technologies shape their perceptions of Socratic Mind, an interactive AI-powered formative assessment tool. Drawing on Self-Determination Theory and user experience research, we analyze relationships among AI literacy, perceived usability, satisfaction, engagement, and perceived learning effectiveness. Data from 309 undergraduates in Computer Science and Business courses were collected through validated surveys. Partial least squares structural equation modeling showed that AI literacy - especially self-efficacy, conceptual understanding, and application skills - significantly predicts usability, satisfaction, and engagement. Usability and satisfaction, in turn, strongly predict perceived learning effectiveness, while prior AI exposure showed no significant effect. These findings highlight that AI literacy, rather than exposure alone, shapes student experiences. Designers should integrate adaptive guidance and user-centered features to support diverse literacy levels, fostering inclusive, motivating, and effective AI-based learning environments.
Authors:Laura Spillner, Nima Zargham, Mihai Pomarlan, Robert Porzel, Rainer Malaka
Abstract:
The need for explanations in AI has, by and large, been driven by the desire to increase the transparency of black-box machine learning models. However, such explanations, which focus on the internal mechanisms that lead to a specific output, are often unsuitable for non-experts. To facilitate a human-centered perspective on AI explanations, agents need to focus on individuals and their preferences as well as the context in which the explanations are given. This paper proposes a personalized approach to explanation, where the agent tailors the information provided to the user based on what is most likely pertinent to them. We propose a model of the agent's worldview that also serves as a personal and dynamic memory of its previous interactions with the same user, based on which the artificial agent can estimate what part of its knowledge is most likely new information to the user.
Authors:Viktoria Marcus, Griffin Pitts, Sanaz Motamedi
Abstract:
Each year, over half of global traffic fatalities involve vulnerable road users (e.g. pedestrians), often due to human error. Level-5 automated driving systems (ADSs) could reduce driver errors contributing to pedestrian accidents, though effectiveness depends on clarity and understandability for other road users. External human-machine interfaces (eHMIs) have been proposed to facilitate pedestrian-ADS communication, though consensus on optimal eHMI features remains unclear. In an online survey, 153 participants responded to road-crossing scenarios involving level-5 ADSs, with and without eHMIs. With eHMIs, pedestrians crossed earlier and more confidently, and reported significantly increased perceptions of safety, trust, and understanding when interacting with level-5 ADSs. Visual eHMI features (including a text display and external speedometer) were ranked more necessary than auditory ones, though auditory cues received positive feedback. This study demonstrates that eHMIs can significantly improve pedestrians' understanding of level-5 ADS intent and enhance perceived safety and trust, facilitating more intuitive pedestrian-ADS interactions.
Authors:Andreas Spilz, Heiko Oppel, Jochen Werner, Kathrin Stucke-Straub, Felix Capanni, Michael Munz
Abstract:
Wearable inertial measurement units (IMUs) offer a cost-effective and scalable means to assess human movement quality in clinical and everyday settings. However, the development of robust sensor-based classification models for physiotherapeutic exercises and gait analysis requires large, diverse datasets, which are costly and time-consuming to collect. Here, we present a multimodal dataset of physiotherapeutic exercises - including correct and clinically relevant variants - and gait-related exercises - including both normal and impaired gait patterns - recorded from 19 participants using synchronized IMUs and marker-based motion capture (MoCap). The dataset includes raw data from nine IMUs and thirty-five optical markers capturing full-body kinematics. Each IMU is additionally equipped with four optical markers, enabling precise comparison between IMU-derived orientation estimates and reference values from the MoCap system. To support further analysis, we also provide processed IMU orientations aligned with common segment coordinate systems, subject-specific OpenSim models, inverse kinematics results, and tools for visualizing IMU orientations in the musculoskeletal context. Detailed annotations of movement execution quality and time-stamped segmentations support diverse analysis goals. This dataset supports the development and benchmarking of machine learning models for tasks such as automatic exercise evaluation, gait analysis, temporal activity segmentation, and biomechanical parameter estimation. To facilitate reproducibility, we provide code for postprocessing, sensor-to-segment alignment, inverse kinematics computation, and technical validation. This resource is intended to accelerate research in machine learning-driven human movement analysis.
Authors:Edvin Teskeredzic, Muamer Paric, Adna Sestic, Petra Fribert, Anamarija Lukac, Hadzem Hadzic, Kemal Altwlkany, Emanuel Lacic
Abstract:
This paper explores the prospect of creating engaging user experiences and collecting leads through an interactive and gamified platform. We introduce Vocalize, an end-to-end system for increasing user engagement and lead acquisition through gamified voice competitions. Using audio processing techniques and LLMs, we create engaging and interactive experiences that have the potential to reach a wide audience, foster brand recognition, and increase customer loyalty. We describe the system from a technical standpoint and report results from launching Vocalize at 4 different live events. Our user study shows that Vocalize is capable of generating significant user engagement, which shows potential for gamified audio campaigns in marketing and similar verticals.
Authors:Jun-Hsiang Yao, Jielin Feng, Xinfang Tian, Kai Xu, Gulshat Amirkhanova, Siming Chen
Abstract:
Virtual Reality (VR) broadcasting has emerged as a promising medium for providing immersive viewing experiences of major sports events such as tennis. However, current VR broadcast systems often lack an effective camera language and do not adequately incorporate dynamic, in-game visualizations, limiting viewer engagement and narrative clarity. To address these limitations, we analyze 400 out-of-play segments from eight major tennis broadcasts to develop a tennis-specific design framework that effectively combines cinematic camera movements with embedded visualizations. We further refine our framework by examining 25 cinematic VR animations, comparing their camera techniques with traditional tennis broadcasts to identify key differences and inform adaptations for VR. Based on data extracted from the broadcast videos, we reconstruct a simulated game that captures the players' and ball's motion and trajectories. Leveraging this design framework and processing pipeline, we develope Beyond the Broadcast, a VR tennis viewing system that integrates embedded visualizations with adaptive camera motions to construct a comprehensive and engaging narrative. Our system dynamically overlays tactical information and key match events onto the simulated environment, enhancing viewer comprehension and narrative engagement while ensuring perceptual immersion and viewing comfort. A user study involving tennis viewers demonstrate that our approach outperforms traditional VR broadcasting methods in delivering an immersive, informative viewing experience.
Authors:Qing Dong, Pengyuan Liu, Dong Yu, Chen Kang
Abstract:
Generative agents have made significant progress in simulating human behavior, but existing frameworks often simplify emotional modeling and focus primarily on specific tasks, limiting the authenticity of the simulation. Our work proposes the Psychological-mechanism Agent (PSYA) framework, based on the Cognitive Triangle (Feeling-Thought-Action), designed to more accurately simulate human behavior. The PSYA consists of three core modules: the Feeling module (using a layer model of affect to simulate changes in short-term, medium-term, and long-term emotions), the Thought module (based on the Triple Network Model to support goal-directed and spontaneous thinking), and the Action module (optimizing agent behavior through the integration of emotions, needs and plans). To evaluate the framework's effectiveness, we conducted daily life simulations and extended the evaluation metrics to self-influence, one-influence, and group-influence, selection five classic psychological experiments for simulation. The results show that the PSYA framework generates more natural, consistent, diverse, and credible behaviors, successfully replicating human experimental outcomes. Our work provides a richer and more accurate emotional and cognitive modeling approach for generative agents and offers an alternative to human participants in psychological experiments.
Authors:Peter Neigel, David Antony Selby, Shota Arai, Benjamin Tag, Niels van Berkel, Sebastian Vollmer, Andrew Vargo, Koichi Kise
Abstract:
Wearable devices offer detailed sleep-tracking data. However, whether this information enhances our understanding of sleep or simply quantifies already-known patterns remains unclear. This work explores the relationship between subjective sleep self-assessments and sensor data from an Oura ring over 4--8 weeks in-the-wild. 29 participants rated their sleep quality daily compared to the previous night and completed a working memory task. Our findings reveal that differences in REM sleep, nocturnal heart rate, N-Back scores, and bedtimes highly predict sleep self-assessment in significance and effect size. For N-Back performance, REM sleep duration, prior night's REM sleep, and sleep self-assessment are the strongest predictors. We demonstrate that self-report sensitivity towards sleep markers differs among participants. We identify three groups, highlighting that sleep trackers provide more information gain for some users than others. Additionally, we make all experiment data publicly available.
Authors:Xinzheng Wu, Junyi Chen, Peiyi Wang, Shunxiang Chen, Haolan Meng, Yong Shen
Abstract:
In the research and development (R&D) and verification and validation (V&V) phases of autonomous driving decision-making and planning systems, it is necessary to integrate human factors to achieve decision-making and evaluation that align with human cognition. However, most existing datasets primarily focus on vehicle motion states and trajectories, neglecting human-related information. In addition, current naturalistic driving datasets lack sufficient safety-critical scenarios while simulated datasets suffer from low authenticity. To address these issues, this paper constructs the Risk-Informed Subjective Evaluation and Eye-tracking (RISEE) dataset which specifically contains human subjective evaluations and eye-tracking data apart from regular naturalistic driving trajectories. By leveraging the complementary advantages of drone-based (high realism and extensive scenario coverage) and simulation-based (high safety and reproducibility) data collection methods, we first conduct drone-based traffic video recording at a highway ramp merging area. After that, the manually selected highly interactive scenarios are reconstructed in simulation software, and drivers' first-person view (FPV) videos are generated, which are then viewed and evaluated by recruited participants. During the video viewing process, participants' eye-tracking data is collected. After data processing and filtering, 3567 valid subjective risk ratings from 101 participants across 179 scenarios are retained, along with 2045 qualified eye-tracking data segments. The collected data and examples of the generated FPV videos are available in our website.
Authors:Simone Bendazzoli, Sanna Persson, Mehdi Astaraki, Sebastian Pettersson, Vitali Grozman, Rodrigo Moreno
Abstract:
The integration of Artificial Intelligence (AI) into clinical workflows requires robust collaborative platforms that are able to bridge the gap between technical innovation and practical healthcare applications. This paper introduces MAIA (Medical Artificial Intelligence Assistant), an open-source platform designed to facilitate interdisciplinary collaboration among clinicians, researchers, and AI developers. Built on Kubernetes, MAIA offers a modular, scalable environment with integrated tools for data management, model development, annotation, deployment, and clinical feedback. Key features include project isolation, CI/CD automation, integration with high-computing infrastructures and in clinical workflows. MAIA supports real-world use cases in medical imaging AI, with deployments in both academic and clinical environments. By promoting collaborations and interoperability, MAIA aims to accelerate the translation of AI research into impactful clinical solutions while promoting reproducibility, transparency, and user-centered design. We showcase the use of MAIA with different projects, both at KTH Royal Institute of Technology and Karolinska University Hospital.
Authors:Gabriel Recchia, Chatrik Singh Mangat, Jinu Nyachhyon, Mridul Sharma, Callum Canavan, Dylan Epstein-Gross, Muhammed Abdulbari
Abstract:
Scalable oversight protocols aim to empower evaluators to accurately verify AI models more capable than themselves. However, human evaluators are subject to biases that can lead to systematic errors. We conduct two studies examining the performance of simple oversight protocols where evaluators know that the model is "correct most of the time, but not all of the time". We find no overall advantage for the tested protocols, although in Study 1, showing arguments in favor of both answers improves accuracy in cases where the model is incorrect. In Study 2, participants in both groups become more confident in the system's answers after conducting online research, even when those answers are incorrect. We also reanalyze data from prior work that was more optimistic about simple protocols, finding that human evaluators possessing knowledge absent from models likely contributed to their positive results--an advantage that diminishes as models continue to scale in capability. These findings underscore the importance of testing the degree to which oversight protocols are robust to evaluator biases, whether they outperform simple deference to the model under evaluation, and whether their performance scales with increasing problem difficulty and model capability.
Authors:WiesÅaw KopeÄ, JarosÅaw Kowalski, Aleksander Majda, Anna Duszyk-Bogorodzka, Anna Jaskulska, Cezary Biele
Abstract:
We report preliminary insights from an exploratory study on non-standard non-invasive interfaces for Smart Home Technologies (SHT). This study is part of a broader research project on effective Smart Home ecosystem Sagacity that will target older adults, impaired persons, and other groups disadvantaged in the main technology discourse. Therefore, this research is in line with a long-term research framework of the HASE research group (Human Aspects in Science and Engineering) by the Living Lab Kobo. In our study, based on the prototype of the comprehensive SHT management system Sagacity, we investigated the potential of bioelectric signals, in particular EMG and EOG as a complementary interface for SHT. Based on our previous participatory research and studies on multimodal interfaces, including VUI and BCI, we prepared an in-depth interactive hands-on experience workshops with direct involvement of various groups of potential end users, including older adults and impaired persons (total 18 subjects) to explore and investigate the potential of solutions based on this type of non-standard interfaces. The preliminary insights from the study unveil the potential of EMG/EOG interfaces in multimodal SHT management, alongside limitations and challenges stemming from the current state of technology and recommendations for designing multimodal interaction paradigms pinpointing areas of interest to pursue in further studies.
Authors:Asad Ali Shahid, Angelo Moroncelli, Drazen Brscic, Takayuki Kanda, Loris Roveda
Abstract:
Recent progress in robot autonomy and safety has significantly improved human-robot interactions, enabling robots to work alongside humans on various tasks. However, complex assembly tasks still present significant challenges due to inherent task variability and the need for precise operations. This work explores deploying robots in an assistive role for such tasks, where the robot assists by fetching parts while the skilled worker provides high-level guidance and performs the assembly. We introduce GEAR, a gaze-enabled system designed to enhance human-robot collaboration by allowing robots to respond to the user's gaze. We evaluate GEAR against a touch-based interface where users interact with the robot through a touchscreen. The experimental study involved 30 participants working on two distinct assembly scenarios of varying complexity. Results demonstrated that GEAR enabled participants to accomplish the assembly with reduced physical demand and effort compared to the touchscreen interface, especially for complex tasks, maintaining great performance, and receiving objects effectively. Participants also reported enhanced user experience while performing assembly tasks. Project page: sites.google.com/view/gear-hri
Authors:Lillian Asiala, James E. McCarthy
Abstract:
Sonalysts, Inc. (Sonalysts) is working on an initiative to expand our expertise in teaming to include Human-Artificial Intelligence (AI) teams. The first step of this process is to develop a Synthetic Task Environment (STE) to support our original research. Prior knowledge elicitation efforts within the Human-AI teaming research stakeholder community revealed a desire to support data collection using pre- and post-performance surveys. In this technical report, we review a number of constructs that capture meaningful individual differences and teaming qualities. Additionally, we explore methods of measuring those constructs within the STE.
Authors:Rachel Ringe, Robin Nolte, Nima Zargham, Robert Porzel, Rainer Malaka
Abstract:
Robot appearance crucially shapes Human-Robot Interaction (HRI) but is typically described via broad categories like anthropomorphic, zoomorphic, or technical. More precise approaches focus almost exclusively on anthropomorphic features, which fail to classify robots across all types, limiting the ability to draw meaningful connections between robot design and its effect on interaction. In response, we present MetaMorph, a comprehensive framework for classifying robot morphology. Using a metamodeling approach, MetaMorph was synthesized from 222 robots in the IEEE Robots Guide, offering a structured method for comparing visual features. This model allows researchers to assess the visual distances between robot models and explore optimal design traits tailored to different tasks and contexts.
Authors:Armin Bernstetter, Tom Kwasnitschka, Isabella Peters
Abstract:
Ensuring reproducibility of research is an integral part of good scientific practice. One way to support this is through provenance: information about research workflows from data gathering to researchers' sensemaking processes leading to published results. This is highly important in disciplines such as geosciences, where researchers use software for interactive and immersive visualizations of geospatial data, doing virtual measurements in simulated fieldwork on 3D models. We evaluated a provenance management tool, which allows recording of interactions with a virtual fieldwork tool and annotating different states of the visualization. The user study investigated how researchers used this Digital Lab Book (DLB) and whether perceived ease of use and perceived usefulness differed between groups in immersive or non-immersive settings. Participants perceived the DLB as both useful and easy to use. While there were indications of differences in perceived ease of use (higher for immersive setting), usage patterns showed no significant group differences.
Authors:Donghoon Shin, Daniel Lee, Gary Hsieh, Gromit Yeuk-Yin Chan
Abstract:
Poster designing can benefit from synchronous feedback from target audiences. However, gathering audiences with diverse perspectives and reconciling them on design edits can be challenging. Recent generative AI models present opportunities to simulate human-like interactions, but it is unclear how they may be used for feedback processes in design. We introduce PosterMate, a poster design assistant that facilitates collaboration by creating audience-driven persona agents constructed from marketing documents. PosterMate gathers feedback from each persona agent regarding poster components, and stimulates discussion with the help of a moderator to reach a conclusion. These agreed-upon edits can then be directly integrated into the poster design. Through our user study (N=12), we identified the potential of PosterMate to capture overlooked viewpoints, while serving as an effective prototyping tool. Additionally, our controlled online evaluation (N=100) revealed that the feedback from an individual persona agent is appropriate given its persona identity, and the discussion effectively synthesizes the different persona agents' perspectives.
Authors:Dongyang Guo, Yasmeen Abdrabou, Enkeleda Thaqi, Enkelejda Kasneci
Abstract:
Eye-tracking data reveals valuable insights into users' cognitive states but is difficult to analyze due to its structured, non-linguistic nature. While large language models (LLMs) excel at reasoning over text, they struggle with temporal and numerical data. This paper presents a multimodal human-AI collaborative framework designed to enhance cognitive pattern extraction from eye-tracking signals. The framework includes: (1) a multi-stage pipeline using horizontal and vertical segmentation alongside LLM reasoning to uncover latent gaze patterns; (2) an Expert-Model Co-Scoring Module that integrates expert judgment with LLM output to generate trust scores for behavioral interpretations; and (3) a hybrid anomaly detection module combining LSTM-based temporal modeling with LLM-driven semantic analysis. Our results across several LLMs and prompt strategies show improvements in consistency, interpretability, and performance, with up to 50% accuracy in difficulty prediction tasks. This approach offers a scalable, interpretable solution for cognitive modeling and has broad potential in adaptive learning, human-computer interaction, and educational analytics.
Authors:Connor Scully-Allison, Kevin Menear, Kristin Potter, Andrew McNutt, Katherine E. Isaacs, Dmitry Duplyakin
Abstract:
Domain-specific visualizations sometimes focus on narrow, albeit important, tasks for one group of users. This focus limits the utility of a visualization to other groups working with the same data. While tasks elicited from other groups can present a design pitfall if not disambiguated, they also present a design opportunity -- development of visualizations that support multiple groups. This development choice presents a trade off of broadening the scope but limiting support for the more narrow tasks of any one group, which in some cases can enhance the overall utility of the visualization. We investigate this scenario through a design study where we develop \textit{Guidepost}, a notebook-embedded visualization of supercomputer queue data that helps scientists assess supercomputer queue wait times, machine learning researchers understand prediction accuracy, and system maintainers analyze usage trends. We adapt the use of personas for visualization design from existing literature in the HCI and software engineering domains and apply them in categorizing tasks based on their uniqueness across the stakeholder personas. Under this model, tasks shared between all groups should be supported by interactive visualizations and tasks unique to each group can be deferred to scripting with notebook-embedded visualization design. We evaluate our visualization with nine expert analysts organized into two groups: a "research analyst" group that uses supercomputer queue data in their research (representing the Machine Learning researchers and Jobs Data Analyst personas) and a "supercomputer user" group that uses this data conditionally (representing the HPC User persona). We find that our visualization serves our three stakeholder groups by enabling users to successfully execute shared tasks with point-and-click interaction while facilitating case-specific programmatic analysis workflows.
Authors:Josh Hunter, John McDermid, Simon Burton
Abstract:
This study investigates how railway professionals perceive safety as a concept within rail, with the intention to help inform future technological developments within the industry. Through a series of interviews with drivers, route planners,and administrative personnel, the research explores the currentstate of safety practices, the potential for automation and the understanding of the railway as a system of systems. Key findings highlight a cautious attitude towards automation, a preference for assistive technologies, and a complex understanding of safety that integrates human, systematic and technological factors. The study also addresses the limitations of transferring automotive automation technologies to railways and the need for a railway-specific causation model to better evaluate and enhance safety in an evolving technological landscape. This study aims to bridge thegap between contemporary research and practical applications, contributing to the development of more effective safety metrics.
Authors:Vita Santa Barletta, Vito Bavaro, Miriana Calvano, Antonio Curci, Antonio Piccinno, Davide Pio Posa
Abstract:
Digital Twins (DTs) are gaining prominence in cybersecurity for their ability to replicate complex IT (Information Technology), OT (Operational Technology), and IoT (Internet of Things) infrastructures, allowing for real time monitoring, threat analysis, and system simulation. This study investigates how integrating DTs with penetration testing tools and Large Language Models (LLMs) can enhance cybersecurity education and operational readiness. By simulating realistic cyber environments, this approach offers a practical, interactive framework for exploring vulnerabilities and defensive strategies. At the core of this research is the Red Team Knife (RTK), a custom penetration testing toolkit aligned with the Cyber Kill Chain model. RTK is designed to guide learners through key phases of cyberattacks, including reconnaissance, exploitation, and response within a DT powered ecosystem. The incorporation of Large Language Models (LLMs) further enriches the experience by providing intelligent, real-time feedback, natural language threat explanations, and adaptive learning support during training exercises. This combined DT LLM framework is currently being piloted in academic settings to develop hands on skills in vulnerability assessment, threat detection, and security operations. Initial findings suggest that the integration significantly improves the effectiveness and relevance of cybersecurity training, bridging the gap between theoretical knowledge and real-world application. Ultimately, the research demonstrates how DTs and LLMs together can transform cybersecurity education to meet evolving industry demands.
Authors:Rachel Ringe, Mihai Pomarlan, Nikolaos Tsiogkas, Stefano De Giorgis, Maria Hedblom, Rainer Malaka
Abstract:
Affordances - i.e. possibilities for action that an environment or objects in it provide - are important for robots operating in human environments to perceive. Existing approaches train such capabilities on annotated static images or shapes. This work presents a novel dataset for affordance learning of common household tasks. Unlike previous approaches, our dataset consists of video sequences demonstrating the tasks from first- and third-person perspectives, along with metadata about the affordances that are manifested in the task, and is aimed towards training perception systems to recognize affordance manifestations. The demonstrations were collected from several participants and in total record about seven hours of human activity. The variety of task performances also allows studying preparatory maneuvers that people may perform for a task, such as how they arrange their task space, which is also relevant for collaborative service robots.
Authors:Xin Chen, Yunhai Wang, Huaiwei Bao, Kecheng Lu, Jaemin Jo, Chi-Wing Fu, Jean-Daniel Fekete
Abstract:
We present a novel visualization-driven illumination model for density plots, a new technique to enhance density plots by effectively revealing the detailed structures in high- and medium-density regions and outliers in low-density regions, while avoiding artifacts in the density field's colors. When visualizing large and dense discrete point samples, scatterplots and dot density maps often suffer from overplotting, and density plots are commonly employed to provide aggregated views while revealing underlying structures. Yet, in such density plots, existing illumination models may produce color distortion and hide details in low-density regions, making it challenging to look up density values, compare them, and find outliers. The key novelty in this work includes (i) a visualization-driven illumination model that inherently supports density-plot-specific analysis tasks and (ii) a new image composition technique to reduce the interference between the image shading and the color-encoded density values. To demonstrate the effectiveness of our technique, we conducted a quantitative study, an empirical evaluation of our technique in a controlled study, and two case studies, exploring twelve datasets with up to two million data point samples.
Authors:Laura Moradbakhti, Dorian Peters, Jennifer K. Quint, Björn Schuller, Darren Cook, Rafael A. Calvo
Abstract:
Asthma-related deaths in the UK are the highest in Europe, and only 30% of patients access basic care. There is a need for alternative approaches to reaching people with asthma in order to provide health education, self-management support and bridges to care. Automated conversational agents (specifically, mobile chatbots) present opportunities for providing alternative and individually tailored access to health education, self-management support and risk self-assessment. But would patients engage with a chatbot, and what factors influence engagement? We present results from a patient survey (N=1257) devised by a team of asthma clinicians, patients, and technology developers, conducted to identify optimal factors for efficacy, value and engagement for a chatbot. Results indicate that most adults with asthma (53%) are interested in using a chatbot and the patients most likely to do so are those who believe their asthma is more serious and who are less confident about self-management. Results also indicate enthusiasm for 24/7 access, personalisation, and for WhatsApp as the preferred access method (compared to app, voice assistant, SMS or website). Obstacles to uptake include security/privacy concerns and skepticism of technological capabilities. We present detailed findings and consolidate these into 7 recommendations for developers for optimising efficacy of chatbot-based health support.
Authors:Natasha Yamane, Varun Mishra, Matthew S. Goodwin
Abstract:
In the last decade, researchers have increasingly explored using biosensing technologies for music-based affective regulation and stress management interventions in laboratory and real-world settings. These systems -- including interactive music applications, brain-computer interfaces, and biofeedback devices -- aim to provide engaging, personalized experiences that improve therapeutic outcomes. In this scoping and mapping review, we summarize and synthesize systematic reviews and empirical research on biosensing systems with potential applications in music-based affective regulation and stress management, identify gaps in the literature, and highlight promising areas for future research. We identified 28 studies involving 646 participants, with most systems utilizing prerecorded music, wearable cardiorespiratory sensors, or desktop interfaces. We categorize these systems based on their biosensing modalities, music types, computational models for affect or stress detection and music prediction, and biofeedback mechanisms. Our findings highlight the promising potential of these systems and suggest future directions, such as integrating multimodal biosensing, exploring therapeutic mechanisms of music, leveraging generative artificial intelligence for personalized music interventions, and addressing methodological, data privacy, and user control concerns.
Authors:Mohammad 'Matt' Namvarpour, Brandon Brofsky, Jessica Medina, Mamtaj Akter, Afsaneh Razi
Abstract:
As Generative Artificial Intelligence (GenAI) driven chatbots like Character.AI become embedded in adolescent life, they raise concerns about emotional dependence and digital overreliance. While studies have investigated the overreliance of adults on these chatbots, they have not investigated teens' interactions with chatbots with customizable personas. We analyzed 318 Reddit posts made by users self-reported as 13-17 years old on the Character.AI subreddit to understand patterns of overreliance. We found teens commonly begin using chatbots for emotional support or creative expression, but many develop strong attachments that interfere with offline relationships and daily routines. Their posts revealed recurring signs of psychological distress, cycles of relapse, and difficulty disengaging. Teens reported that their overreliance often ended when they reflect on the harm, return to in-person social settings, or become frustrated by platform restrictions. Based on the implications of our findings, we provide recommendations for future chatbot design so they can promote self-awareness, support real-world engagement, and involve teens in developing safer digital tools.
Authors:Argianto Rahartomo, Leonel Merino, Mohammad Ghafari
Abstract:
The rapid growth of metaverse technologies, including virtual worlds, augmented reality, and lifelogging, has accelerated their adoption across diverse domains. This rise exposes users to significant new security and privacy challenges due to sociotechnical complexity, pervasive connectivity, and extensive user data collection in immersive environments. We present a systematic review of the literature published between 2013 and 2024, offering a comprehensive analysis of how the research community has addressed metaverse-related security and privacy issues over the past decade. We organize the studies by method, examined the security and privacy properties, immersive components, and evaluation strategies. Our investigation reveals a sharp increase in research activity in the last five years, a strong focus on practical and user-centered approaches, and a predominant use of benchmarking, human experimentation, and qualitative methods. Authentication and unobservability are the most frequently studied properties. However, critical gaps remain in areas such as policy compliance, accessibility, interoperability, and back-end infrastructure security. We emphasize the intertwined technical complexity and human factors of the metaverse and call for integrated, interdisciplinary approaches to securing inclusive and trustworthy immersive environments.
Authors:Boning Zhao, Yutong Hu, Xinnuo Li
Abstract:
Large language models (LLMs) have demonstrated technical accuracy in high-risk domains, such as mental health support and special education. However, they often fail to meet the nuanced behavioral expectations of domain experts. This gap constrains AI deployment in sensitive settings. To address this challenge, we introduce LEKIA (Layered Expert Knowledge Injection Architecture), a novel framework built upon the principle of expert-owned AI behavior design. LEKIA's core innovation lies in its dual architecture: a three-layer knowledge injection system featuring our "Supervision Metaphor Cycle", and a dual-agent safety system ensuring robustness and consistency. We implemented and evaluated LEKIA within psychological support scenarios in special education. Experiments indicate that LEKIA improves performance by 14.8% over baseline, driven by substantive increase in alignment with expert expectations while preserving technical accuracy. Beyond providing a reproducible technical framework, this work demonstrates expert-expectation alignment as a measurable evaluation criterion with implications for AI deployment in high-risk domains.
Authors:Nils Mandischer, Alexander Atanasyan, Ulrich Dahmen, Michael Schluse, Jürgen Rossmann, Lars Mikelsons
Abstract:
The digital twin of humans is a relatively new concept. While many diverse definitions, architectures, and applications exist, a clear picture is missing on what, in fact, makes a human digital twin. Within this context, researchers and industrial use-case owners alike are unaware about the market potential of the - at the moment - rather theoretical construct. In this work, we draw a holistic vision of the human digital twin, and derive the specification of this holistic human digital twin in form of requirements, stakeholders, and users. For each group of users, we define exemplary applications that fall into the six levels of functionality: store, analyze, personalize, predict, control, and optimize. The functionality levels facilitate an abstraction of abilities of the human digital twin. From the manifold applications, we discuss three in detail to showcase the feasibility of the abstraction levels and the analysis of stakeholders and users. Based on the deep discussion, we derive a comprehensive list of requirements on the holistic human digital twin. These considerations shall be used as a guideline for research and industries for the implementation of human digital twins, particularly in context of reusability in multiple target applications.
Authors:Yanming Zhang, Krishnakumar Hegde, Klaus Mueller
Abstract:
Causality helps people reason about and understand complex systems, particularly through what-if analyses that explore how interventions might alter outcomes. Although existing methods embrace causal reasoning using interventions and counterfactual analysis, they primarily focus on effects at the population level. These approaches often fall short in systems characterized by significant heterogeneity, where the impact of an intervention can vary widely across subgroups. To address this challenge, we present XplainAct, a visual analytics framework that supports simulating, explaining, and reasoning interventions at the individual level within subpopulations. We demonstrate the effectiveness of XplainAct through two case studies: investigating opioid-related deaths in epidemiology and analyzing voting inclinations in the presidential election.
Authors:Xuetao Lin, Tianhao Peng, Peihong Dai, Yu Liang, Wenjun Wu
Abstract:
EEG-based emotion recognition plays an important role in developing adaptive brain-computer communication systems, yet faces two fundamental challenges in practical implementations: (1) effective integration of non-stationary spatial-temporal neural patterns, (2) robust adaptation to dynamic emotional intensity variations in real-world scenarios. This paper proposes SST-CL, a novel framework integrating spatial-temporal transformers with curriculum learning. Our method introduces two core components: a spatial encoder that models inter-channel relationships and a temporal encoder that captures multi-scale dependencies through windowed attention mechanisms, enabling simultaneous extraction of spatial correlations and temporal dynamics from EEG signals. Complementing this architecture, an intensity-aware curriculum learning strategy progressively guides training from high-intensity to low-intensity emotional states through dynamic sample scheduling based on a dual difficulty assessment. Comprehensive experiments on three benchmark datasets demonstrate state-of-the-art performance across various emotional intensity levels, with ablation studies confirming the necessity of both architectural components and the curriculum learning mechanism.
Authors:Albert Chen, Manas Bundele, Gaurav Ahlawat, Patrick Stetz, Zhitao Wang, Qiang Fei, Donghoon Jung, Audrey Chu, Bharadwaj Jayaraman, Ayushi Panth, Yatin Arora, Sourav Jain, Renjith Varma, Alexey Ilin, Iuliia Melnychuk, Chelsea Chueh, Joyan Sil, Xiaofeng Wang
Abstract:
The introduction of large language models has brought rapid progress on Text-to-SQL benchmarks, but it is not yet easy to build a working enterprise solution. In this paper, we present insights from building an internal chatbot that enables LinkedIn's product managers, engineers, and operations teams to self-serve data insights from a large, dynamic data lake. Our approach features three components. First, we construct a knowledge graph that captures up-to-date semantics by indexing database metadata, historical query logs, wikis, and code. We apply clustering to identify relevant tables for each team or product area. Second, we build a Text-to-SQL agent that retrieves and ranks context from the knowledge graph, writes a query, and automatically corrects hallucinations and syntax errors. Third, we build an interactive chatbot that supports various user intents, from data discovery to query writing to debugging, and displays responses in rich UI elements to encourage follow-up chats. Our chatbot has over 300 weekly users. Expert review shows that 53% of its responses are correct or close to correct on an internal benchmark set. Through ablation studies, we identify the most important knowledge graph and modeling components, offering a practical path for developing enterprise Text-to-SQL solutions.
Authors:Xianhao Carton Liu, Difan Jia, Tongyu Nie, Evan Suma Rosenberg, Victoria Interrante, Chen Zhu-Tian
Abstract:
In high-stakes, time-critical scenarios-such as emergency evacuation, first responder prioritization, and crisis management -- decision-makers must rapidly choose among spatial targets, such as exits, individuals to assist, or areas to secure. Advances in indoor sensing and artificial intelligence (AI) can support these decisions by visualizing real-time situational data and AI suggestions on 2D maps. However, mentally mapping this information onto real-world spaces imposes significant cognitive load. This load can impair users' ability to appropriately judge AI suggestions, leading to inappropriate reliance (e.g., accepting wrong AI suggestions or rejecting correct ones). Embedded visualizations in Augmented Reality (AR), by directly overlaying information onto physical environments, may reduce this load and foster more deliberate, appropriate reliance on AI. But is this true? In this work, we conducted an empirical study (N = 32) comparing AR see-through (embedded visualization) and 2D Minimap in time-critical, AI-assisted spatial target selection tasks. Contrary to our expectations, users exhibited greater inappropriate reliance on AI in the AR condition. Our analysis further reveals that this is primarily due to over-reliance, with factors specific to embedded visualizations, such as perceptual challenges, visual proximity illusions, and highly realistic visual representations. Nonetheless, embedded visualizations demonstrated notable benefits in spatial reasoning, such as spatial mapping and egocentric spatial imagery. We conclude by discussing the empirical insights, deriving design implications, and outlining important directions for future research on human-AI decision collaboration in AR.
Authors:Hanxiu 'Hazel' Zhu, Ruijia Chen, Yuhang Zhao
Abstract:
While videos have become increasingly prevalent in delivering information across different educational and professional contexts, individuals with ADHD often face attention challenges when watching informational videos due to the dynamic, multimodal, yet potentially distracting video elements. To understand and address this critical challenge, we designed \textit{FocusView}, a video customization interface that allows viewers with ADHD to customize informational videos from different aspects. We evaluated FocusView with 12 participants with ADHD and found that FocusView significantly improved the viewability of videos by reducing distractions. Through the study, we uncovered participants' diverse perceptions of video distractions (e.g., background music as a distraction vs. stimulation boost) and their customization preferences, highlighting unique ADHD-relevant needs in designing video customization interfaces (e.g., reducing the number of options to avoid distraction caused by customization itself). We further derived design considerations for future video customization systems for the ADHD community.
Authors:Ehud Sharlin, Benjamin Watson, Yoshifumi Kitamura, Fumio Kishino, Yuichi Itoh
Abstract:
Like the prehistoric twig and stone, tangible user interfaces (TUIs) are objects manipulated by humans. TUI success will depend on how well they exploit spatiality, the intuitive spatial skills humans have with the objects they use. In this paper we carefully examine the relationship between humans and physical objects, and related previous research. From this examination we distill a set of observations, and turn these into heuristics for incorporation of spatiality into TUI application design, a cornerstone for their success. Following this line of thought, we identify spatial TUIs, the subset of TUIs that mediate interaction with shape, space and structure. We then examine several existing spatial TUIs using our heuristics.
Authors:Rajdeep Mondal, Nikolaj Bjorner, Todd Millstein, Alan Tang, George Varghese
Abstract:
Beyond hallucinations, another problem in program synthesis using LLMs is ambiguity in user intent. We illustrate the ambiguity problem in a networking context for LLM-based incremental configuration synthesis of route-maps and ACLs. These structures frequently overlap in header space, making the relative priority of actions impossible for the LLM to infer without user interaction. Measurements in a large cloud identify complex ACLs with 100's of overlaps, showing ambiguity is a real problem. We propose a prototype system, Clarify, which uses an LLM augmented with a new module called a Disambiguator that helps elicit user intent. On a small synthetic workload, Clarify incrementally synthesizes routing policies after disambiguation and then verifies them. Our treatment of ambiguities is useful more generally when the intent of updates can be correctly synthesized by LLMs, but their integration is ambiguous and can lead to different global behaviors.
Authors:Chase Stokes, Anjana Arunkumar, Marti A. Hearst, Lace Padilla
Abstract:
Text is an integral but understudied component of visualization design. Although recent studies have examined how text elements (e.g., titles and annotations) influence comprehension, preferences, and predictions, many questions remain about textual design and use in practice. This paper introduces a framework for understanding text functions in information visualizations, building on and filling gaps in prior classifications and taxonomies. Through an analysis of 120 real-world visualizations and 804 text elements, we identified ten distinct text functions, ranging from identifying data mappings to presenting valenced subtext. We further identify patterns in text usage and conduct a factor analysis, revealing four overarching text-informed design strategies: Attribution and Variables, Annotation-Centric Design, Visual Embellishments, and Narrative Framing. In addition to these factors, we explore features of title rhetoric and text multifunctionality, while also uncovering previously unexamined text functions, such as text replacing visual elements. Our findings highlight the flexibility of text, demonstrating how different text elements in a given design can combine to communicate, synthesize, and frame visual information. This framework adds important nuance and detail to existing frameworks that analyze the diverse roles of text in visualization.
Authors:Jiangnan Xu, Haeseul Cha, Gosu Choi, Gyu-cheol Lee, Yeo-Jin Yoon, Zucheul Lee, Konstantinos Papangelis, Dae Hyun Kim, Juho Kim
Abstract:
An interactive vignette is a popular and immersive visual storytelling approach that invites viewers to role-play a character and influences the narrative in an interactive environment. However, it has not been widely used by everyday storytellers yet due to authoring complexity, which conflicts with the immediacy of everyday storytelling. We introduce DiaryPlay, an AI-assisted authoring system for interactive vignette creation in everyday storytelling. It takes a natural language story as input and extracts the three core elements of an interactive vignette (environment, characters, and events), enabling authors to focus on refining these elements instead of constructing them from scratch. Then, it automatically transforms the single-branch story input into a branch-and-bottleneck structure using an LLM-powered narrative planner, which enables flexible viewer interactions while freeing the author from multi-branching. A technical evaluation (N=16) shows that DiaryPlay-generated character activities are on par with human-authored ones regarding believability. A user study (N=16) shows that DiaryPlay effectively supports authors in creating interactive vignette elements, maintains authorial intent while reacting to viewer interactions, and provides engaging viewing experiences.
Authors:Wenqing Wu, Chengzhi Zhang, Yi Zhao
Abstract:
Novelty is a crucial criterion in the peer review process for evaluating academic papers. Traditionally, it's judged by experts or measure by unique reference combinations. Both methods have limitations: experts have limited knowledge, and the effectiveness of the combination method is uncertain. Moreover, it's unclear if unique citations truly measure novelty. The large language model (LLM) possesses a wealth of knowledge, while human experts possess judgment abilities that the LLM does not possess. Therefore, our research integrates the knowledge and abilities of LLM and human experts to address the limitations of novelty assessment. One of the most common types of novelty in academic papers is the introduction of new methods. In this paper, we propose leveraging human knowledge and LLM to assist pretrained language models (PLMs, e.g. BERT etc.) in predicting the method novelty of papers. Specifically, we extract sentences related to the novelty of the academic paper from peer review reports and use LLM to summarize the methodology section of the academic paper, which are then used to fine-tune PLMs. In addition, we have designed a text-guided fusion module with novel Sparse-Attention to better integrate human and LLM knowledge. We compared the method we proposed with a large number of baselines. Extensive experiments demonstrate that our method achieves superior performance.
Authors:Rushia Harada, Yuken Kimura, Keito Inoshita
Abstract:
Well-being in family settings involves subtle psychological dynamics that conventional metrics often overlook. In particular, unconscious parental expectations, termed ideal parent bias, can suppress children's emotional expression and autonomy. This suppression, referred to as suppressed emotion, often stems from well-meaning but value-driven communication, which is difficult to detect or address from outside the family. Focusing on these latent dynamics, this study explores Large Language Model (LLM)-based support for psychologically safe family communication. We constructed a Japanese parent-child dialogue corpus of 30 scenarios, each annotated with metadata on ideal parent bias and suppressed emotion. Based on this corpus, we developed a Role-Playing LLM-based multi-agent dialogue support framework that analyzes dialogue and generates feedback. Specialized agents detect suppressed emotion, describe implicit ideal parent bias in parental speech, and infer contextual attributes such as the child's age and background. A meta-agent compiles these outputs into a structured report, which is then passed to five selected expert agents. These agents collaboratively generate empathetic and actionable feedback through a structured four-step discussion process. Experiments show that the system can detect categories of suppressed emotion with moderate accuracy and produce feedback rated highly in empathy and practicality. Moreover, simulated follow-up dialogues incorporating this feedback exhibited signs of improved emotional expression and mutual understanding, suggesting the framework's potential in supporting positive transformation in family interactions.
Authors:Mikko Korkiakoski, Saeid Sheikhi, Jesper Nyman, Jussi Saariniemi, Kalle Tapio, Panos Kostakos
Abstract:
Advancements in artificial intelligence (AI) have significantly enhanced the realism and interactivity of non-player characters (NPCs) in virtual reality (VR), creating more engaging and believable user experiences. This paper evaluates AI-driven NPCs within a VR interrogation simulator, focusing on their perceived realism, usability, and system performance. The simulator features two AI-powered NPCs, a suspect, and a partner, using GPT-4 Turbo to engage participants in a scenario to determine the suspect's guilt or innocence. A user study with 18 participants assessed the system using the System Usability Scale (SUS), Game Experience Questionnaire (GEQ), and a Virtual Agent Believability Questionnaire, alongside latency measurements for speech-to-text (STT), text-to-speech (TTS), OpenAI GPT-4 Turbo, and overall (cycle) latency. Results showed an average cycle latency of 7 seconds, influenced by the increasing conversational context. Believability scored 6.67 out of 10, with high ratings in behavior, social relationships, and intelligence but moderate scores in emotion and personality. The system achieved a SUS score of 79.44, indicating good usability. These findings demonstrate the potential of large language models to improve NPC realism and interaction in VR while highlighting challenges in reducing system latency and enhancing emotional depth. This research contributes to the development of more sophisticated AI-driven NPCs, revealing the need for performance optimization to achieve increasingly immersive virtual experiences.
Authors:Joel Becker, Nate Rush, Elizabeth Barnes, David Rein
Abstract:
Despite widespread adoption, the impact of AI tools on software development in the wild remains understudied. We conduct a randomized controlled trial (RCT) to understand how AI tools at the February-June 2025 frontier affect the productivity of experienced open-source developers. 16 developers with moderate AI experience complete 246 tasks in mature projects on which they have an average of 5 years of prior experience. Each task is randomly assigned to allow or disallow usage of early 2025 AI tools. When AI tools are allowed, developers primarily use Cursor Pro, a popular code editor, and Claude 3.5/3.7 Sonnet. Before starting tasks, developers forecast that allowing AI will reduce completion time by 24%. After completing the study, developers estimate that allowing AI reduced completion time by 20%. Surprisingly, we find that allowing AI actually increases completion time by 19%--AI tooling slowed developers down. This slowdown also contradicts predictions from experts in economics (39% shorter) and ML (38% shorter). To understand this result, we collect and evaluate evidence for 20 properties of our setting that a priori could contribute to the observed slowdown effect--for example, the size and quality standards of projects, or prior developer experience with AI tooling. Although the influence of experimental artifacts cannot be entirely ruled out, the robustness of the slowdown effect across our analyses suggests it is unlikely to primarily be a function of our experimental design.
Authors:Aashish Panta, Amy Gooch, Giorgio Scorzelli, Michela Taufer, Valerio Pascucci
Abstract:
The growing resolution and volume of climate data from remote sensing and simulations pose significant storage, processing, and computational challenges. Traditional compression or subsampling methods often compromise data fidelity, limiting scientific insights. We introduce a scalable ecosystem that integrates hierarchical multiresolution data management, intelligent transmission, and ML-assisted reconstruction to balance accuracy and efficiency. Our approach reduces storage and computational costs by 99\%, lowering expenses from \$100,000 to \$24 while maintaining a Root Mean Square (RMS) error of 1.46 degrees Celsius. Our experimental results confirm that even with significant data reduction, essential features required for accurate climate analysis are preserved. Validated on petascale NASA climate datasets, this solution enables cost-effective, high-fidelity climate analysis for research and decision-making.
Authors:Kayhan Latifzadeh, Jacek Gwizdka, Luis A. Leiva
Abstract:
We contribute a comprehensive dataset to study user attention and purchasing behavior on Search Engine Result Pages (SERPs). Previous work has relied on mouse movements as a low-cost large-scale behavioral proxy but also has relied on self-reported ground-truth labels, collected at post-task, which can be inaccurate and prone to biases. To address this limitation, we use an eye tracker to construct an objective ground-truth of continuous visual attention. Our dataset comprises 2,776 transactional queries on Google SERPs, collected from 47 participants, and includes: (1) HTML source files, with CSS and images; (2) rendered SERP screenshots; (3) eye movement data; (4) mouse movement data; (5) bounding boxes of direct display and organic advertisements; and (6) scripts for further preprocessing the data. In this paper we provide an overview of the dataset and baseline experiments (classification tasks) that can inspire researchers about the different possibilities for future work.
Authors:Karisa Parkington, Bazen G. Teferra, Marianne Rouleau-Tang, Argyrios Perivolaris, Alice Rueda, Adam Dubrowski, Bill Kapralos, Reza Samavi, Andrew Greenshaw, Yanbo Zhang, Bo Cao, Yuqi Wu, Sirisha Rambhatla, Sridhar Krishnan, Venkat Bhat
Abstract:
Thematic analysis provides valuable insights into participants' experiences through coding and theme development, but its resource-intensive nature limits its use in large healthcare studies. Large language models (LLMs) can analyze text at scale and identify key content automatically, potentially addressing these challenges. However, their application in mental health interviews needs comparison with traditional human analysis. This study evaluates out-of-the-box and knowledge-base LLM-based thematic analysis against traditional methods using transcripts from a stress-reduction trial with healthcare workers. OpenAI's GPT-4o model was used along with the Role, Instructions, Steps, End-Goal, Narrowing (RISEN) prompt engineering framework and compared to human analysis in Dedoose. Each approach developed codes, noted saturation points, applied codes to excerpts for a subset of participants (n = 20), and synthesized data into themes. Outputs and performance metrics were compared directly. LLMs using the RISEN framework developed deductive parent codes similar to human codes, but humans excelled in inductive child code development and theme synthesis. Knowledge-based LLMs reached coding saturation with fewer transcripts (10-15) than the out-of-the-box model (15-20) and humans (90-99). The out-of-the-box LLM identified a comparable number of excerpts to human researchers, showing strong inter-rater reliability (K = 0.84), though the knowledge-based LLM produced fewer excerpts. Human excerpts were longer and involved multiple codes per excerpt, while LLMs typically applied one code. Overall, LLM-based thematic analysis proved more cost-effective but lacked the depth of human analysis. LLMs can transform qualitative analysis in mental healthcare and clinical research when combined with human oversight to balance participant perspectives and research resources.
Authors:Yasmin Elsaddik Valdivieso, Mohd Faisal, Karim Alghoul, Monireh, Vahdati, Kamran Gholizadeh Hamlabadi, Fedwa Laamarti, Hussein Al Osman, Abdulmotaleb El Saddik
Abstract:
Immersive virtual reality (VR) is a promising tool for stress reduction and relaxation, traditionally relying on visual and auditory stimuli. This study examines the role of olfactory stimuli in enhancing these effects, using a randomized within-subject design. Thirty participants aged 18-60 experienced VR scenarios simulating a calming seaside environment, with sessions lasting 45 minutes, in two conditions: with and without a "Beach" essential oil scent (Yankee Candle) administered via diffuser. Stress and relaxation were assessed through self-reported surveys and physiological measures, specifically ECG-based heart rate variability (HRV). Results showed no significant difference in self-reported relaxation scores (p=0.371) between conditions, but HRV analysis revealed a significant stress reduction (p=0.002) with olfactory input, with HF increasing 108% from the Math Stress Test to the scented relaxation condition, compared to 44% without scent. Additionally, 71.4% of participants expressed willingness to use olfactory-enhanced VR for relaxation, suggesting practical appeal. These findings indicate that olfactory stimuli may enhance relaxation subconsciously, underscoring the importance of multisensory integration in VR. Future work could explore personalized scents and long-term effects to optimize VR- based interventions for emotional and physical well-being.
Authors:Nils Mandischer, Larissa Füller, Torsten Alles, Frank Flemisch, Lars Mikelsons
Abstract:
Human and automation capabilities are the foundation of every human-autonomy interaction and interaction pattern. Therefore, machines need to understand the capacity and performance of human doing, and adapt their own behavior, accordingly. In this work, we address the concept of conjugated capabilities, i.e. capabilities that are dependent or interrelated and between which effort can be distributed. These may be used to overcome human limitations, by shifting effort from a deficient to a conjugated capability with performative resources. For example: A limited arm's reach may be compensated by tilting the torso forward. We analyze the interrelation between elementary capabilities within the IMBA standard to uncover potential conjugation, and show evidence in data of post-rehabilitation patients. From the conjugated capabilities, within the example application of stationary manufacturing, we create a network of interrelations. With this graph, a manifold of potential uses is enabled. We showcase the graph's usage in optimizing IMBA test design to accelerate data recordings, and discuss implications of conjugated capabilities on task allocation between the human and an autonomy.
Authors:Marianne Bossema, Rob Saunders, Aske Plaat, Somaya Ben Allouch
Abstract:
This position paper explores pluriperspectivism as a core element of human creative experience and its relevance to humanrobot cocreativity We propose a layered fivedimensional model to guide the design of cocreative behaviors and the analysis of interaction dynamics This model is based on literature and results from an interview study we conducted with 10 visual artists and 8 arts educators examining how pluriperspectivism supports creative practice The findings of this study provide insight in how robots could enhance human creativity through adaptive contextsensitive behavior demonstrating the potential of pluriperspectivism This paper outlines future directions for integrating pluriperspectivism with visionlanguage models VLMs to support context sensitivity in cocreative robots
Authors:Lucile Favero, Juan-Antonio Pérez-Ortiz, Tanja Käser, Nuria Oliver
Abstract:
The increasing integration of AI tools in education presents both opportunities and challenges, particularly regarding the development of the students' critical thinking skills. This position paper argues that while AI can support learning, its unchecked use may lead to cognitive atrophy, loss of agency, emotional risks, and ethical concerns, ultimately undermining the core goals of education. Drawing on cognitive science and pedagogy, the paper explores how over-reliance on AI can disrupt meaningful learning, foster dependency and conformity, undermine the students' self-efficacy, academic integrity, and well-being, and raise concerns about questionable privacy practices. It also highlights the importance of considering the students' perspectives and proposes actionable strategies to ensure that AI serves as a meaningful support rather than a cognitive shortcut. The paper advocates for an intentional, transparent, and critically informed use of AI that empowers rather than diminishes the learner.
Authors:Martin Wimpff, Jan Zerfowski, Bin Yang
Abstract:
Despite the growing success of deep learning (DL) in offline brain-computer interfaces (BCIs), its adoption in real-time applications remains limited due to three primary challenges. First, most DL solutions are designed for offline decoding, making the transition to online decoding unclear. Second, the use of sliding windows in online decoding substantially increases computational complexity. Third, DL models typically require large amounts of training data, which are often scarce in BCI applications. To address these challenges and enable real-time, cross-subject decoding without subject-specific calibration, we introduce realtime adaptive pooling (RAP), a novel parameter-free method. RAP seamlessly modifies the pooling layers of existing offline DL models to meet online decoding requirements. It also reduces computational complexity during training by jointly decoding consecutive sliding windows. To further alleviate data requirements, our method leverages source-free domain adaptation, enabling privacy-preserving adaptation across varying amounts of target data. Our results demonstrate that RAP provides a robust and efficient framework for real-time BCI applications. It preserves privacy, reduces calibration demands, and supports co-adaptive BCI systems, paving the way for broader adoption of DL in online BCIs. These findings lay a strong foundation for developing user-centered, high-performance BCIs that facilitate immediate feedback and user learning.
Authors:Neil Rathi, Dan Jurafsky, Kaitlyn Zhou
Abstract:
As large language models (LLMs) are deployed globally, it is crucial that their responses are calibrated across languages to accurately convey uncertainty and limitations. Prior work shows that LLMs are linguistically overconfident in English, leading users to overrely on confident generations. However, the usage and interpretation of epistemic markers (e.g., 'I think it's') differs sharply across languages. Here, we study the risks of multilingual linguistic (mis)calibration, overconfidence, and overreliance across five languages to evaluate LLM safety in a global context. Our work finds that overreliance risks are high across languages. We first analyze the distribution of LLM-generated epistemic markers and observe that LLMs are overconfident across languages, frequently generating strengtheners even as part of incorrect responses. Model generations are, however, sensitive to documented cross-linguistic variation in usage: for example, models generate the most markers of uncertainty in Japanese and the most markers of certainty in German and Mandarin. Next, we measure human reliance rates across languages, finding that reliance behaviors differ cross-linguistically: for example, participants are significantly more likely to discount expressions of uncertainty in Japanese than in English (i.e., ignore their 'hedging' function and rely on generations that contain them). Taken together, these results indicate a high risk of reliance on overconfident model generations across languages. Our findings highlight the challenges of multilingual linguistic calibration and stress the importance of culturally and linguistically contextualized model safety evaluations.
Authors:Shuning Jiang, Wei-Lun Chao, Daniel Haehn, Hanspeter Pfister, Jian Chen
Abstract:
We present a data-domain sampling regime for quantifying CNNs' graphic perception behaviors. This regime lets us evaluate CNNs' ratio estimation ability in bar charts from three perspectives: sensitivity to training-test distribution discrepancies, stability to limited samples, and relative expertise to human observers. After analyzing 16 million trials from 800 CNNs models and 6,825 trials from 113 human participants, we arrived at a simple and actionable conclusion: CNNs can outperform humans and their biases simply depend on the training-test distance. We show evidence of this simple, elegant behavior of the machines when they interpret visualization images. osf.io/gfqc3 provides registration, the code for our sampling regime, and experimental results.
Authors:Nikhita Joshi, Daniel Vogel
Abstract:
Writing longer prompts for an AI assistant to generate a short story increases psychological ownership, a user's feeling that the writing belongs to them. To encourage users to write longer prompts, we evaluated two interaction techniques that modify the prompt entry interface of chat-based generative AI assistants: pressing and holding the prompt submission button, and continuously moving a slider up and down when submitting a short prompt. A within-subjects experiment investigated the effects of such techniques on prompt length and psychological ownership, and results showed that these techniques increased prompt length and led to higher psychological ownership than baseline techniques. A second experiment further augmented these techniques by showing AI-generated suggestions for how the prompts could be expanded. This further increased prompt length, but did not lead to improvements in psychological ownership. Our results show that simple interface modifications like these can elicit more writing from users and improve psychological ownership.
Authors:Anastasia Sergeeva, Claudia Negri-Ribalta, Gabriele Lenzini
Abstract:
Although extended reality(XR)-using technologies have started to be discussed in the industrial setting, it is becoming important to understand how to implement them ethically and privacy-preservingly. In our paper, we summarise our experience of developing XR implementations for the off-highway machinery domain by pointing to the main challenges we identified during the work. We believe that our findings can be a starting point for further discussion and future research regarding privacy and ethical challenges in industrial applications of XR.
Authors:Ehud Sharlin, Pablo Figueroa, Mark Green, Benjamin Watson
Abstract:
CAVE displays offer many advantages over other virtual reality (VR) displays, including a large, unencumbering viewing space. Unfortunately, the typical tracking subsystems used with CAVE displays tether the user and lessen this advantage. We have designed a simple, low-cost feet tracker that is wireless, leaving the user free to move. The tracker can be assembled for less than $200 US, and achieves an accuracy of 10 cm at a 20 Hz sampling rate. We have tested the prototype with two applications: a visualization supporting close visual inspection, and a walkthrough of the campus. Although the tracking was convincing, it was clear that the tracker's limitations make it less than ideal for applications requiring precise visual inspection. However, the freedom of motion allowed by the tracker was a compelling supplement to our campus walkthrough, allowing users to stroll and look around corners.
Authors:Ehud Sharlin, Yuichi Itoh, Benjamin Watson, Yoshifumi Kitamura, Steve Sutphen, Lili Liu, Fumio Kishino
Abstract:
This paper discusses Tangible User Interfaces (TUIs) and their potential impact on cognitive assessment and cognitive training. We believe that TUIs, and particularly a subset that we dub spatial TUIs, can extend human computer interaction beyond some of its current limitations. Spatial TUIs exploit human innate spatial and tactile ability in an intuitive and direct manner, affording interaction paradigms that are practically impossible using current interface technology. As proof-of-concept we examine implementations in the field of cognitive assessment and training. In this paper we use Cognitive Cubes, a novel TUI we developed, as an applied test bed for our beliefs, presenting promising experimental results for cognitive assessment of spatial ability, and possibly for training purposes.
Authors:Devin Y. De Silva, Sandareka Wickramanayake, Dulani Meedeniya, Sanka Rasnayaka
Abstract:
Human Activity Recognition (HAR), which uses data from Inertial Measurement Unit (IMU) sensors, has many practical applications in healthcare and assisted living environments. However, its use in real-world scenarios has been limited by the lack of comprehensive IMU-based HAR datasets that cover a wide range of activities and the lack of transparency in existing HAR models. Zero-shot HAR (ZS-HAR) overcomes the data limitations, but current models struggle to explain their decisions, making them less transparent. This paper introduces a novel IMU-based ZS-HAR model called the Self-Explainable Zero-shot Human Activity Recognition Network (SEZ-HARN). It can recognize activities not encountered during training and provide skeleton videos to explain its decision-making process. We evaluate the effectiveness of the proposed SEZ-HARN on four benchmark datasets PAMAP2, DaLiAc, HTD-MHAD and MHealth and compare its performance against three state-of-the-art black-box ZS-HAR models. The experiment results demonstrate that SEZ-HARN produces realistic and understandable explanations while achieving competitive Zero-shot recognition accuracy. SEZ-HARN achieves a Zero-shot prediction accuracy within 3\% of the best-performing black-box model on PAMAP2 while maintaining comparable performance on the other three datasets.
Authors:Bernd Hofmann, Sven Kreitlein, Joerg Franke, Patrick Bruendl
Abstract:
This paper proposes an agent-based approach toward a more natural interface between humans and machines. Large language models equipped with tools and the communication standard OPC UA are utilized to control machines in natural language. Instead of touch interaction, which is currently the state-of-the-art medium for interaction in operations, the proposed approach enables operators to talk or text with machines. This allows commands such as 'Please decrease the temperature by 20 % in machine 1 and set the motor speed to 5000 rpm in machine 2.' The large language model receives the user input and selects one of three predefined tools that connect to an OPC UA server and either change or read the value of a node. Afterwards, the result of the tool execution is passed back to the language model, which then provides a final response to the user. The approach is universally designed and can therefore be applied to any machine that supports the OPC UA standard. The large language model is neither fine-tuned nor requires training data, only the relevant machine credentials and a parameter dictionary are included within the system prompt. The approach is evaluated on a Siemens S7-1500 programmable logic controller with four machine parameters in a case study of fifty synthetically generated commands on five different models. The results demonstrate high success rate, with proprietary GPT 5 models achieving accuracies between 96.0 % and 98.0 %, and open-weight models reaching up to 90.0 %. The proposed approach of this empirical study contributes to advancing natural interaction in industrial human-machine interfaces.
Authors:Michele Romani, Devis Zanoni, Elisabetta Farella, Luca Turchet
Abstract:
$\textit{BrainForm}$ is a gamified Brain-Computer Interface (BCI) training system designed for scalable data collection using consumer hardware and a minimal setup. We investigated (1) how users develop BCI control skills across repeated sessions and (2) perceptual and performance effects of two visual stimulation textures. Game Experience Questionnaire (GEQ) scores for Flow}, Positive Affect, Competence and Challenge were strongly positive, indicating sustained engagement. A within-subject study with multiple runs, two task complexities, and post-session questionnaires revealed no significant performance differences between textures but increased ocular irritation over time. Online metrics$\unicode{x2013}$Task Accuracy, Task Time, and Information Transfer Rate$\unicode{x2013}$improved across sessions, confirming learning effects for symbol spelling, even under pressure conditions. Our results highlight the potential of $\textit{BrainForm}$ as a scalable, user-friendly BCI research tool and offer guidance for sustained engagement and reduced training fatigue.
Authors:Jiawen Li, Zheng Ning, Yuan Tian, Toby Jia-jun Li
Abstract:
Large language models (LLMs) enable end-users to delegate complex tasks to autonomous agents through natural language. However, prompt-based interaction faces critical limitations: Users often struggle to specify procedural requirements for tasks, especially those that don't have a factually correct solution but instead rely on personal preferences, such as posting social media content or planning a trip. Additionally, a ''successful'' prompt for one task may not be reusable or generalizable across similar tasks. We present ALLOY, a system inspired by classical HCI theories on Programming by Demonstration (PBD), but extended to enhance adaptability in creating LLM-based web agents. ALLOY enables users to express procedural preferences through natural demonstrations rather than prompts, while making these procedures transparent and editable through visualized workflows that can be generalized across task variations. In a study with 12 participants, ALLOY's demonstration--based approach outperformed prompt-based agents and manual workflows in capturing user intent and procedural preferences in complex web tasks. Insights from the study also show how demonstration--based interaction complements the traditional prompt-based approach.
Authors:Yaxuan Mao, Yanheng Li, Duo Gong, Pengcheng An, Yuhan Luo
Abstract:
Dental anxiety is prevalent among children, often leading to missed treatment and potential negative effects on their mental well-being. While several interventions (e.g., pharmacological and psychotherapeutic techniques) have been introduced for anxiety alleviation, the recently emerged virtual reality (VR) technology, with its immersive and playful nature, opened new opportunities for complementing and enhancing the therapeutic effects of existing interventions. In this light, we conducted a series of co-design workshops with 13 children aged 10-12 to explore how they envisioned using VR to address their fear and stress associated with dental visits, followed by interviews with parents (n = 13) and two dentists. Our findings revealed that children expected VR to provide immediate relief, social support, and a sense of control during dental treatment, parents sought educational opportunities for their children to learn about oral health, and dentists prioritized treatment efficiency and safety issues. Drawing from the findings, we discuss the considerations of multi-stakeholders for developing VR-assisted anxiety management applications for children within and beyond dental settings.
Authors:Zekun Wu, Anna Maria Feit
Abstract:
Monitoring interfaces are crucial for dynamic, highstakes tasks where effective user attention is essential. Visual highlights can guide attention effectively but may also introduce unintended disruptions. To investigate this, we examined how visual highlights affect users' gaze behavior in a drone monitoring task, focusing on when, how long, and how much attention they draw. We found that highlighted areas exhibit distinct temporal characteristics compared to non-highlighted ones, quantified using normalized saliency (NS) metrics. Highlights elicited immediate responses, with NS peaking quickly, but this shift came at the cost of reduced search efforts elsewhere, potentially impacting situational awareness. To predict these dynamic changes and support interface design, we developed the Highlight-Informed Saliency Model (HISM), which provides granular predictions of NS over time. These predictions enable evaluations of highlight effectiveness and inform the optimal timing and deployment of highlights in future monitoring interface designs, particularly for time-sensitive tasks.
Authors:Victor Victor, Tania Krisanty, Matthew McGinity, Stefan Gumhold, Uwe Aßmann
Abstract:
As the markets for unmanned aerial vehicles (UAVs) and mixed reality (MR) headsets continue to grow, recent research has increasingly explored their integration, which enables more intuitive, immersive, and situationally aware control systems. We present IGUANA, an MR-based immersive guidance, navigation, and control system for consumer UAVs. IGUANA introduces three key elements beyond conventional control interfaces: (1) a 3D terrain map interface with draggable waypoint markers and live camera preview for high-level control, (2) a novel spatial control metaphor that uses a virtual ball as a physical analogy for low-level control, and (3) a spatial overlay that helps track the UAV when it is not visible with the naked eye or visual line of sight is interrupted. We conducted a user study to evaluate our design, both quantitatively and qualitatively, and found that (1) the 3D map interface is intuitive and easy to use, relieving users from manual control and suggesting improved accuracy and consistency with lower perceived workload relative to conventional dual-stick controller, (2) the virtual ball interface is intuitive but limited by the lack of physical feedback, and (3) the spatial overlay is very useful in enhancing the users' situational awareness.
Authors:Yichuan Zhang, Liangyuting Zhang, Xuning Hu, Yong Yue, Hai-Ning Liang
Abstract:
Gaze input, as a modality inherently conveying user intent, offers intuitive and immersive experiences in extended reality (XR). With eye-tracking now being a standard feature in modern XR headsets, gaze has been extensively applied to tasks such as selection, text entry, and object manipulation. However, gaze based navigation despite being a fundamental interaction task remains largely underexplored. In particular, little is known about which path types are well suited for gaze navigation and under what conditions it performs effectively. To bridge this gap, we conducted a controlled user study evaluating gaze-based navigation across three representative path types: linear, narrowing, and circular. Our findings reveal distinct performance characteristics and parameter ranges for each path type, offering design insights and practical guidelines for future gaze-driven navigation systems in XR.
Authors:Francesca Cocchella, Nilay Roy Choudhury, Eric Chen, Patrícia Alves-Oliveira
Abstract:
As robotic technologies evolve, their potential in artistic creation becomes an increasingly relevant topic of inquiry. This study explores how professional abstract artists perceive and experience co-creative interactions with an autonomous painting robotic arm. Eight artists engaged in six painting sessions -- three with a human partner, followed by three with the robot -- and subsequently participated in semi-structured interviews analyzed through reflexive thematic analysis. Human-human interactions were described as intuitive, dialogic, and emotionally engaging, whereas human-robot sessions felt more playful and reflective, offering greater autonomy and prompting for novel strategies to overcome the system's limitations. This work offers one of the first empirical investigations into artists' lived experiences with a robot, highlighting the value of long-term engagement and a multidisciplinary approach to human-robot co-creation.
Authors:Takahiro Matsumoto, Takahiro Kusabuka, Hiroshi Chigira, Kazuhiko Murasaki, Kakagu Komazaki, Masafumi Suzuki, Masakatsu Aoki
Abstract:
We present a low-latency tele-immersive entertainment system that streams 3D point clouds and performers' footstep vibrations, creating the sense that the stage is present. Moving performers and their surroundings are captured as dynamic point clouds under rapidly changing lighting, then processed, transmitted, and rendered within a total latency of less than 100 ms. Under high ambient noise, footstep vibrations are sensed by wearable accelerometers. Real-time visual and haptic streams are delivered to a remote venue, where a large 3D LED wall and a vibration-efficient haptic floor envelop dozens of spectators. A public trial at Expo 2025 linked sites 20 km apart: visitors watched a live dance show and conversed with performers without noticeable delay.
Authors:Renee Shelby, Fernando Diaz, Vinodkumar Prabhakaran
Abstract:
The growing ubiquity of conversational AI highlights the need for frameworks that capture not only users' instrumental goals but also the situated, adaptive, and social practices through which they achieve them. Existing taxonomies of conversational behavior either overgeneralize, remain domain-specific, or reduce interactions to narrow dialogue functions. To address this gap, we introduce the Taxonomy of User Needs and Actions (TUNA), an empirically grounded framework developed through iterative qualitative analysis of 1193 human-AI conversations, supplemented by theoretical review and validation across diverse contexts. TUNA organizes user actions into a three-level hierarchy encompassing behaviors associated with information seeking, synthesis, procedural guidance, content creation, social interaction, and meta-conversation. By centering user agency and appropriation practices, TUNA enables multi-scale evaluation, supports policy harmonization across products, and provides a backbone for layering domain-specific taxonomies. This work contributes a systematic vocabulary for describing AI use, advancing both scholarly understanding and practical design of safer, more responsive, and more accountable conversational systems.
Authors:Matthew Jörke, Defne Genç, Valentin Teutschbein, Shardul Sapkota, Sarah Chung, Paul Schmiedmayer, Maria Ines Campero, Abby C. King, Emma Brunskill, James A. Landay
Abstract:
Large language models (LLMs) offer novel opportunities to support health behavior change, yet existing work has narrowly focused on text-only interactions. Building on decades of HCI research demonstrating the effectiveness of UI-based interactions, we present Bloom, an application for physical activity promotion that integrates an LLM-based health coaching chatbot with established UI-based interactions. As part of Bloom's development, we conducted a redteaming evaluation and contribute a safety benchmark dataset. In a four-week randomized field study (N=54) comparing Bloom to a non-LLM control, we observed important shifts in psychological outcomes: participants in the LLM condition reported stronger beliefs that activity was beneficial, greater enjoyment, and more self-compassion. Both conditions significantly increased physical activity levels, doubling the proportion of participants meeting recommended weekly guidelines, though we observed no significant differences between conditions. Instead, our findings suggest that LLMs may be more effective at shifting mindsets that precede longer-term behavior change.
Authors:Mingjin Li, Yu Liu, Huayi Liu, Xiang Ye, Chao Jiang, Hongguang Zhang
Abstract:
We propose MADS (Multi-Agent Dialogue Simulation), a scalable framework for generating persuasive multi-turn dialogues via agent self-play. MADS employs three coordinated agents: User Agents simulating diverse persona-driven behaviors, a Dialog Agent executing task-oriented persuasion strategies and an Optimization Agent evaluating and refining dialogue outcomes. We further validate its effectiveness through users' Chain-of-Attitude (CoA) modeling and dedicated LLMs' persuasion assessment. This approach enables low-cost generation of training data without human annotation, addressing key industry challenges such as lack of user data, cold-start evaluation difficulties, and prompt inefficiency. Applied to a real-world marketing scenario, MADS significantly improved the persuasion capacity of small LLMs, increasing the organic traffic conversion rate by 22.4\% (from 1.83\% to 2.24\%) , demonstrating clear business value.
Authors:Jade Hak, Nathaniel Lam Johnson, Matin Amoozadeh, Amin Alipour, Souti Chattopadhyay
Abstract:
Large Language Models (LLMs) such as ChatGPT have quickly become part of student programmers' toolkits, whether allowed by instructors or not. This paper examines how introductory programming (CS1) students integrate LLMs into their problem-solving processes. We conducted a mixed-methods study with 14 undergraduates completing three programming tasks while thinking aloud and permitted to access any resources they choose. The tasks varied in open-endedness and familiarity to the participants and were followed by surveys and interviews. We find that students frequently adopt a pattern we call pseudo-apprenticeship, where students engage attentively with expert-level solutions provided by LLMs but fail to participate in the stages of cognitive apprenticeship that promote independent problem-solving. This pattern was augmented by disconnects between students' intentions, actions, and self-perceived behavior when using LLMs. We offer design and instructional interventions for promoting learning and addressing the patterns of dependent AI use observed.
Authors:Leni Yang, Zezhong Wang, Xingyu Lan
Abstract:
We have witnessed rapid growth in data storytelling research. Scholars from multiple disciplines have contributed new theories and techniques surrounding data storytelling. However, with this prolific development, a fuzzy boundary of data storytelling comes. We argue that understanding how "data storytelling" has been defined and interpreted by academia is crucial for facilitating communication between researchers, encouraging the consistent use of concepts and measures, assisting newcomers in approaching and positioning their research in this area, and enabling the effective application of relevant techniques and tools. Thus, it is necessary to systematically reflect on "what is data storytelling" and promote a more thorough understanding of this concept. Specifically, we investigated how existing research has conceptualized "data storytelling." As a result, we identified 96 publications that provide explicit definitions. By coding these definitions in-depth, we identified five paradigms of defining data storytelling, as well as a broad spectrum of interpretations regarding the content, objectives, and techniques of data storytelling. Finally, we concluded with implications for future research, aiming to foster nuanced communication about "data storytelling," suggest research opportunities, and establish a more inclusive theoretical foundation for this research direction.
Authors:Morita Tarvirdians, Senthil Chandrasegaran, Hayley Hung, Catholijn M. Jonker, Catharine Oertel
Abstract:
When making significant life decisions, people increasingly turn to conversational AI tools, such as large language models (LLMs). However, LLMs often steer users toward solutions, limiting metacognitive awareness of their own decision-making. In this paper, we shift the focus in decision support from solution-orientation to reflective activity, coining the term pre-decision reflection (PDR). We introduce PROBE, the first framework that assesses pre-decision reflections along two dimensions: breadth (diversity of thought categories) and depth (elaborateness of reasoning). Coder agreement demonstrates PROBE's reliability in capturing how people engage in pre-decision reflection. Our study reveals substantial heterogeneity across participants and shows that people perceived their unassisted reflections as deeper and broader than PROBE's measures. By surfacing hidden thought patterns, PROBE opens opportunities for technologies that foster self-awareness and strengthen people's agency in choosing which thought patterns to rely on in decision-making.
Authors:Jaeryang Baek, Ayana Hussain, Danny Liu, Nicholas Vincent, Lawrence H. Kim
Abstract:
While LLMs enable a range of AI applications, interacting with multiple models and customizing workflows can be challenging, and existing LLM interfaces offer limited support for collaborative extension or real-world evaluation. In this work, we present an interface toolkit for LLMs designed to be open (open-source and local), extensible (plugin support and users can interact with multiple models), and usable. The extensibility is enabled through a two-pronged plugin architecture and a community platform for sharing, importing, and adapting extensions. To evaluate the system, we analyzed organic engagement through social platforms, conducted a user survey, and provided notable examples of the toolkit in the wild. Through studying how users engage with and extend the toolkit, we show how extensible, open LLM interfaces provide both functional and social value, and highlight opportunities for future HCI work on designing LLM toolkit platforms and shaping local LLM-user interaction.
Authors:Liang-Yuan Wu, Dhruv Jain
Abstract:
Automatic Speech Recognition (ASR) systems often fail to accurately transcribe speech from Deaf and Hard of Hearing (DHH) individuals, especially during real-time conversations. Existing personalization approaches typically require extensive pre-recorded data and place the burden of adaptation on the DHH speaker. We present EvolveCaptions, a real-time, collaborative ASR adaptation system that supports in-situ personalization with minimal effort. Hearing participants correct ASR errors during live conversations. Based on these corrections, the system generates short, phonetically targeted prompts for the DHH speaker to record, which are then used to fine-tune the ASR model. In a study with 12 DHH and six hearing participants, EvolveCaptions reduced Word Error Rate (WER) across all DHH users within one hour of use, using only five minutes of recording time on average. Participants described the system as intuitive, low-effort, and well-integrated into communication. These findings demonstrate the promise of collaborative, real-time ASR adaptation for more equitable communication.
Authors:Saketh Ram Kasibatla, Kiran Medleri Hiremath, Raven Rothkopf, Sorin Lerner, Haijun Xia, Brian Hempel
Abstract:
Although birthed in the era of teletypes, the command line shell survived the graphical interface revolution of the 1980's and lives on in modern desktop operating systems. The command line provides access to powerful functionality not otherwise exposed on the computer, but requires users to recall textual syntax and carefully scour documentation. In contrast, graphical interfaces let users organically discover and invoke possible actions through widgets and menus. To better expose the power of the command line, we demonstrate a mechanism for automatically creating graphical interfaces for command line tools by translating their documentation (in the form of man pages) into interface specifications via AI. Using these specifications, our user-facing system, called GUIde, presents the command options to the user graphically. We evaluate the generated interfaces on a corpus of commands to show to what degree GUIde offers thorough graphical interfaces for users' real-world command line tasks.
Authors:Rodrigo Lankaites Pinheiro, Dario Landa-Silva, Jason Atkin
Abstract:
Understanding the relationships between objectives in a multiobjective optimisation problem is important for developing tailored and efficient solving techniques. In particular, when tackling combinatorial optimisation problems with many objectives, that arise in real-world logistic scenarios, better support for the decision maker can be achieved through better understanding of the often complex fitness landscape. This paper makes a contribution in this direction by presenting a technique that allows a visualisation and analysis of the local and global relationships between objectives in optimisation problems with many objectives. The proposed technique uses four steps: First, the global pairwise relationships are analysed using the Kendall correlation method; then, the ranges of the values found on the given Pareto front are estimated and assessed; next, these ranges are used to plot a map using Gray code, similar to Karnaugh maps, that has the ability to highlight the trade-offs between multiple objectives; and finally, local relationships are identified using scatter plots. Experiments are presented for three combinatorial optimisation problems: multiobjective multidimensional knapsack problem, multiobjective nurse scheduling problem, and multiobjective vehicle routing problem with time windows . Results show that the proposed technique helps in the gaining of insights into the problem difficulty arising from the relationships between objectives.
Authors:Xiaoye Michael Wang, Ali Mazalek, Catherine M. Sabiston, Timothy N. Welsh
Abstract:
Virtual reality (VR) introduces sensory perturbations that may impact perception and action. The current study was designed to investigate how immersive VR presented through a head-mounted display (HMD) affects perceived functional body size using a passable aperture paradigm. Participants (n=60) performed an action task (sidle through apertures) and a perception task (adjust aperture width until passable without contact) in both physical, unmediated reality (UR) and VR. Results revealed significantly higher action and perceptual thresholds in VR compared to UR. Affordance ratios (perceptual threshold over action threshold) were also higher in VR, indicating that the increase in perceptual thresholds in VR was driven partly by sensorimotor uncertainty, as reflected in the increase in the action thresholds, and partly by perceptual distortions imposed by VR. This perceptual overestimation in VR also persisted as an aftereffect in UR following VR exposure. Geometrical modelling attributed the disproportionate increase in the perceptual threshold in VR primarily to depth compression. This compression, stemming from the vergence-accommodation conflict (VAC), caused the virtual aperture to be perceived as narrower than depicted, thus requiring a wider adjusted aperture. Critically, after mathematically correcting for the VAC's impact on perceived aperture width, the affordance ratios in VR became equivalent to those in UR. These outcomes demonstrate a recovered invariant geometrical scaling, suggesting that perception remained functionally attuned to action capabilities once VAC-induced distortions were accounted for. These findings highlight that VR-induced depth compression systematically alters perceived body-environment relationships, leading to an altered sense of one's functional body size.
Authors:Ke Er Amy Zhang, David Grellscheid, Laura Garrison
Abstract:
We propose a design space for data melodification, where standard visualization idioms and fundamental data characteristics map to rhetorical devices of music for a more affective experience of data. Traditional data sonification transforms data into sound by mapping it to different parameters such as pitch, volume, and duration. Often and regrettably, this mapping leaves behind melody, harmony, rhythm and other musical devices that compose the centuries-long persuasive and expressive power of music. What results is the occasional, unintentional sense of tinnitus and horror film-like impending doom caused by a disconnect between the semantics of data and sound. Through this work we ask, can the aestheticization of sonification through (classical) music theory make data simultaneously accessible, meaningful, and pleasing to one's ears?
Authors:Xiang Chang, Zhijie Yi, Yichang Liu, Hongling Sheng, Dengbo He
Abstract:
This study investigates how pedestrian trust, receptivity, and behavior evolve during interactions with Level-4 autonomous vehicles (AVs) at uncontrolled urban intersections in a naturalistic setting. While public acceptance is critical for AV adoption, most prior studies relied on simplified simulations or field tests. We conducted a real-world experiment in a commercial Robotaxi operation zone, where 33 participants repeatedly crossed an uncontrolled intersection with frequent Level-4 Robotaxi traffic. Participants completed the Pedestrian Behavior Questionnaire (PBQ), Pedestrian Receptivity Questionnaire for Fully AVs (PRQF), pre- and post-experiment Trust in AVs Scale, and Personal Innovativeness Scale (PIS). Results showed that trust in AVs significantly increased post-experiment, with the increase positively associated with the Interaction component of PRQF. Additionally, both the Positive and Error subscales of the PBQ significantly influenced trust change. This study reveals how trust forms in real-world pedestrian-AV encounters, offering insights beyond lab-based research by accounting for population heterogeneity.
Authors:Elizabeth McKinnie, Anas Buhayh, Clement Canel, Robin Burke
Abstract:
Ensuring fair outcomes for multiple stakeholders in recommender systems has been studied mostly in terms of algorithmic interventions: building new models with better fairness properties, or using reranking to improve outcomes from an existing algorithm. What has rarely been studied is structural changes in the recommendation ecosystem itself. Our work explores the fairness impact of algorithmic pluralism, the idea that the recommendation algorithm is decoupled from the platform through which users access content, enabling user choice in algorithms. Prior work using a simulation approach has shown that niche consumers and (especially) niche providers benefit from algorithmic choice. In this paper, we use simulation to explore the question of profile portability, to understand how different policies regarding the handling of user profiles interact with fairness outcomes for consumers and providers.
Authors:Esen K. Tütüncü, Lissette Lemus, Kris Pilcher, Holger Sprengel, Jordi Sabater-Mir
Abstract:
Commonaiverse is an interactive installation exploring human emotions through full-body motion tracking and real-time AI feedback. Participants engage in three phases: Teaching, Exploration and the Cosmos Phase, collaboratively expressing and interpreting emotions with the system. The installation integrates MoveNet for precise motion tracking and a multi-recommender AI system to analyze emotional states dynamically, responding with adaptive audiovisual outputs. By shifting from top-down emotion classification to participant-driven, culturally diverse definitions, we highlight new pathways for inclusive, ethical affective computing. We discuss how this collaborative, out-of-the-box approach pushes multimedia research beyond single-user facial analysis toward a more embodied, co-created paradigm of emotional AI. Furthermore, we reflect on how this reimagined framework fosters user agency, reduces bias, and opens avenues for advanced interactive applications.
Authors:Ruishan Wu, Zhuoyang Li, Charles Perin, Sheelagh Carpendale, Can Liu
Abstract:
Personal Affective Physicalization is the process by which individuals express emotions through tangible forms to record, reflect on, and communicate. Yet such physical data representations can be challenging to design due to the abstract nature of emotions. Given the shown potential of AI in detecting emotion and assisting design, we explore opportunities in AI-assisted design of personal affective physicalization using a Research-through-Design method. We developed PhEmotion, a tool for embedding LLM-extracted emotion values from human-AI conversations into parametric design of physical artifacts. A lab study was conducted with 14 participants creating these artifacts based on their personal emotions, with and without AI support. We observed nuances and variations in participants' creative strategies, meaning-making processes and their perceptions of AI support in this context. We found key tensions in AI-human co-creation that provide a nuanced agenda for future research in AI-assisted personal affective physicalization.
Authors:Hellina Hailu Nigatu, Nuredin Ali Abdelkadir, Fiker Tewelde, Stevie Chancellor, Daricia Wilkinson
Abstract:
Data voids--areas of the internet where reliable information is scarce or absent--pose significant challenges to online health information seeking, particularly for users operating in low-web data languages. These voids are increasingly encountered not on traditional search engines alone, but on social media platforms, which have gradually morphed into informal search engines for millions of people. In this paper, we introduce the phenomenon of data horizons: a critical boundary where algorithmic structures begin to degrade the relevance and reliability of search results. Unlike the core of a data void, which is often exploited by bad actors to spread misinformation, the data horizon marks the critical space where systemic factors, such as linguistic underrepresentation, algorithmic amplification, and socio-cultural mismatch, create conditions of informational instability. Focusing on Tigrinya and Amharic as languages of study, we evaluate (1) the common characteristics of search results for health queries, (2) the quality and credibility of health information, and (3) characteristics of search results that diverge from their queries. We find that search results for health queries in low-web data languages may not always be in the language of search and may be dominated by nutritional and religious advice. We show that search results that diverge from their queries in low-resourced languages are due to algorithmic failures, (un)intentional manipulation, or active manipulation by content creators. We use our findings to illustrate how a data horizon manifests under several interacting constraints on information availability.
Authors:Rose E. Guingrich, Michael S. A. Graziano
Abstract:
Relationships with social artificial intelligence (AI) agents are on the rise. People report forming friendships, mentorships, and romantic partnerships with chatbots such as Replika, a type of social AI agent that is designed specifically for companionship. Concerns that companion chatbot relationships may harm or replace human ones have been raised, but whether and how these social consequences occur remains unclear. Prior research suggests that people's states of social need and their anthropomorphism of the AI agent may play a role in how human-AI interaction impacts human-human interaction. In this longitudinal study (N = 183), participants were randomly assigned to converse with a companion chatbot over text or to play text-based word games for 10 minutes a day for 21 consecutive days. During these 21 days, participants also completed four surveys and two audio-recorded interviews. We found that people's social health and relationships were not significantly impacted by interacting with a companion chatbot across 21 days compared to the control group. However, people who had a higher desire to socially connect anthropomorphized the chatbot more. Those who anthropomorphized the chatbot more indicated that the human-chatbot interaction had greater impacts on their social interactions and relationships with family and friends. A mediation analysis suggested that the impact of human-AI interaction on human-human social outcomes was mediated by the extent to which people anthropomorphized the AI agent, which itself was related to the desire to socially connect.
Authors:Maxime Manderlier, Fabian Lecron, Olivier Vu Thanh, Nicolas Gillis
Abstract:
We investigate whether large language models (LLMs) can generate effective, user-facing explanations from a mathematically interpretable recommendation model. The model is based on constrained matrix factorization, where user types are explicitly represented and predicted item scores share the same scale as observed ratings, making the model's internal representations and predicted scores directly interpretable. This structure is translated into natural language explanations using carefully designed LLM prompts. Many works in explainable AI rely on automatic evaluation metrics, which often fail to capture users' actual needs and perceptions. In contrast, we adopt a user-centered approach: we conduct a study with 326 participants who assessed the quality of the explanations across five key dimensions-transparency, effectiveness, persuasion, trust, and satisfaction-as well as the recommendations themselves. To evaluate how different explanation strategies are perceived, we generate multiple explanation types from the same underlying model, varying the input information provided to the LLM. Our analysis reveals that all explanation types are generally well received, with moderate statistical differences between strategies. User comments further underscore how participants react to each type of explanation, offering complementary insights beyond the quantitative results.
Authors:Anukriti Kumar, Tanushree Padath, Lucy Lu Wang
Abstract:
PDFs remain the dominant format for scholarly communication, despite significant accessibility challenges for blind and low-vision users. While various tools attempt to evaluate PDF accessibility, there is no standardized methodology to evaluate how different accessibility assessment approaches perform. Our work addresses this critical gap by introducing a novel benchmark dataset of scholarly PDFs with expert-validated accessibility annotations across seven criteria (alternative text quality, logical reading order, semantic tagging, table structure, functional hyperlinks, color contrast, and font readability), and a four-category evaluation framework with standardized labels (Passed, Failed, Not Present, Cannot Tell) to systematically assess accessibility evaluation approaches. Using our evaluation framework, we explore whether large language models (LLMs) are capable of supporting automated accessibility evaluation. We benchmark five LLMs, which demonstrate varying capabilities in correctly assessing different accessibility criteria, with GPT-4-Turbo achieving the highest overall accuracy (0.85). However, all models struggled in correctly categorizing documents with Not Present and Cannot Tell accessibility labels, particularly for alt text quality assessment. Our qualitative comparison with standard automated checkers reveals complementary strengths: rule-based tools excel at technical verification, while LLMs better evaluate semantic appropriateness and contextual relevance. Based on our findings, we propose a hybrid approach that would combine automated checkers, LLM evaluation, and human assessment as a future strategy for PDF accessibility evaluation.
Authors:Christopher Cofie Kuzagbe, Fabrice Mukarage, Skye Nandi Adams, N'guessan Yves-Roland Douha, Edith Talina Luhanga
Abstract:
Background: Approximately 1 in 100 children worldwide are diagnosed with Autism Spectrum Disorder (ASD), and 46% to 89% experience significant feeding difficulties. Mobile health applications (mHealth apps) have emerged as a potential tool for scalable support. However, their quality and relevance in managing ASD-related feeding challenges remain unclear. Objective: To identify and evaluate the quality of mHealth apps available in the Africa region addressing feeding difficulties in children with ASD. Methods: A systematic search was conducted on the Apple App Store and Google Play Store between September and October 2024. Applications were included if they were free, in English, updated within the past year, explicitly focused on feeding in children with autism, available in the Africa region, and had more than 100 downloads. Eligible apps were assessed using the Behavior Change Wheel (BCW) framework and rated with the Mobile App Rating Scale (MARS) across four domains: engagement, functionality, aesthetics, and information quality. Results: Of the 326 applications identified, only two iOS apps met all inclusion criteria. EduKitchen-Toddlers Food Games featured child-centered interactive games and sensory-friendly visuals, while Autism Food Coach 2 provided structured caregiver tools, visual meal plans, and progress tracking. Both apps aligned with multiple BCW intervention functions, including education, training, and enablement. MARS scores of 3.7 and 3.9 indicated acceptable to good usability and content quality. Conclusion: There is a critical shortage of high-quality, evidence-based mHealth applications addressing feeding difficulties in children with ASD. Future development should prioritize clinical validation and the integration of comprehensive, caregiver-centered support features to address this gap.
Authors:Ajay Narayanan Sridhar, Fuli Qiao, Nelson Daniel Troncoso Aldas, Yanpei Shi, Mehrdad Mahdavi, Laurent Itti, Vijaykrishnan Narayanan
Abstract:
People with visual impairments often face significant challenges in locating and retrieving objects in their surroundings. Existing assistive technologies present a trade-off: systems that offer precise guidance typically require pre-scanning or support only fixed object categories, while those with open-world object recognition lack spatial feedback for reaching the object. To address this gap, we introduce 'NaviSense', a mobile assistive system that combines conversational AI, vision-language models, augmented reality (AR), and LiDAR to support open-world object detection with real-time audio-haptic guidance. Users specify objects via natural language and receive continuous spatial feedback to navigate toward the target without needing prior setup. Designed with insights from a formative study and evaluated with 12 blind and low-vision participants, NaviSense significantly reduced object retrieval time and was preferred over existing tools, demonstrating the value of integrating open-world perception with precise, accessible guidance.
Authors:Joel Miller, E. Glen Weyl, Chris Kanich
Abstract:
We discuss an algorithmic intervention aimed at increasing equity and economic efficiency at a crowdfunding platform that gives cash subsidies to grantees. Through a blend of technical and qualitative methods, we show that the previous algorithm used by the platform -- Quadratic Funding (QF) -- suffered problems because its design was rooted in a model of individuals as isolated and selfish. We present an alternative algorithm -- Connection-Oriented Quadratic Funding (CO-QF) -- rooted in a theory of plurality and prosocial utilities, and show that it qualitatively and quantitatively performs better than QF. CO-QF has achieved an 89% adoption rate at the platform and has distributed over $4 Million to date. In simulations we show that it provides better social welfare than QF. While our design for CO-QF was responsive to the needs of a specific community, we also extrapolate out of this context to show that CO-QF is a potentially helpful tool for general-purpose public decision making.
Authors:Mian Zhong, Pristina Wang, Anjalie Field
Abstract:
Despite numerous applications for fine-grained corpus analysis, researchers continue to rely on manual labeling, which does not scale, or statistical tools like topic modeling, which are difficult to control. We propose that LLMs have the potential to scale the nuanced analyses that researchers typically conduct manually to large text corpora. To this effect, inspired by qualitative research methods, we develop HICode, a two-part pipeline that first inductively generates labels directly from analysis data and then hierarchically clusters them to surface emergent themes. We validate this approach across three diverse datasets by measuring alignment with human-constructed themes and demonstrating its robustness through automated and human evaluations. Finally, we conduct a case study of litigation documents related to the ongoing opioid crisis in the U.S., revealing aggressive marketing strategies employed by pharmaceutical companies and demonstrating HICode's potential for facilitating nuanced analyses in large-scale data.
Authors:Zaid Hakami, Yuzhou Feng, Bogdan Carbunar
Abstract:
Censorship and the distribution of false information, tools used to manipulate what users see and believe, are seemingly at opposite ends of the information access spectrum. Most previous work has examined them in isolation and within individual countries, leaving gaps in our understanding of how these information manipulation tools interact and reinforce each other across diverse societies. In this paper, we study perceptions about the interplay between censorship, false information, and influence operations, gathered through a mixed-methods study consisting of a survey (n = 384) and semi-structured interviews (n = 30) with participants who have experienced these phenomena across diverse countries in both the Global South and Global North, including Bangladesh, China, Cuba, Iran, Venezuela, and the United States. Our findings reveal perceptions of cooperation across various platforms between distinct entities working together to create information cocoons, within which censorship and false information become imperceptible to those affected. Building on study insights, we propose novel platform-level interventions to enhance transparency and help users navigate information manipulation. In addition, we introduce the concept of plausibly deniable social platforms, enabling censored users to provide credible, benign explanations for their activities, protecting them from surveillance and coercion.
Authors:Kanato Masayoshi, Masahiro Hashimoto, Ryoichi Yokoyama, Naoki Toda, Yoshifumi Uwamino, Shogo Fukuda, Ho Namkoong, Masahiro Jinzaki
Abstract:
Background: Large language models (LLMs) show promise in medicine, but their deployment in hospitals is limited by restricted access to electronic health record (EHR) systems. The Model Context Protocol (MCP) enables integration between LLMs and external tools. Objective: To evaluate whether an LLM connected to an EHR database via MCP can autonomously retrieve clinically relevant information in a real hospital setting. Methods: We developed EHR-MCP, a framework of custom MCP tools integrated with the hospital EHR database, and used GPT-4.1 through a LangGraph ReAct agent to interact with it. Six tasks were tested, derived from use cases of the infection control team (ICT). Eight patients discussed at ICT conferences were retrospectively analyzed. Agreement with physician-generated gold standards was measured. Results: The LLM consistently selected and executed the correct MCP tools. Except for two tasks, all tasks achieved near-perfect accuracy. Performance was lower in the complex task requiring time-dependent calculations. Most errors arose from incorrect arguments or misinterpretation of tool results. Responses from EHR-MCP were reliable, though long and repetitive data risked exceeding the context window. Conclusions: LLMs can retrieve clinical data from an EHR via MCP tools in a real hospital setting, achieving near-perfect performance in simple tasks while highlighting challenges in complex ones. EHR-MCP provides an infrastructure for secure, consistent data access and may serve as a foundation for hospital AI agents. Future work should extend beyond retrieval to reasoning, generation, and clinical impact assessment, paving the way for effective integration of generative AI into clinical practice.
Authors:David Shoresh, Yonatan Loewenstein
Abstract:
Your company's CEO is retiring. You search for a successor. You can promote an employee from the company familiar with the company's operations, or recruit an external professional manager. Who should you prefer? It has not been clear how to address this question, the "subject matter expertise vs. professional manager debate", quantitatively and objectively. We note that a company's success depends on long sequences of interdependent decisions, with often-opposing recommendations of diverse board members. To model this task in a controlled environment, we utilize chess - a complex, sequential game with interdependent decisions which allows for quantitative analysis of performance and expertise (since the states, actions and game outcomes are well-defined). The availability of chess engines differing in style and expertise, allows scalable experimentation. We considered a team of (computer) chess players. At each turn, team members recommend a move and a manager chooses a recommendation. We compared the performance of two manager types. For manager as "subject matter expert", we used another (computer) chess player that assesses the recommendations of the team members based on its own chess expertise. We examined the performance of such managers at different strength levels. To model a "professional manager", we used Reinforcement Learning (RL) to train a network that identifies the board positions in which different team members have relative advantage, without any pretraining in chess. We further examined this network to see if any chess knowledge is acquired implicitly. We found that subject matter expertise beyond a minimal threshold does not significantly contribute to team synergy. Moreover, performance of a RL-trained "professional" manager significantly exceeds that of even the best "expert" managers, while acquiring only limited understanding of chess.
Authors:Seungwon Yang, Suwon Yoon, Jeongwon Choi, Inseok Hwang
Abstract:
Augmented reality (AR) is often realized through head-mounted displays, offering immersive but egocentric experiences. While smartphone-based AR is more accessible, it remains limited to handheld, single-user interaction. We introduce Chameleon, a prototype AR system that transforms smartphones into surface-anchored displays for co-located use. When placed flat, the phone creates a transparency illusion and anchors digital content visible to multiple users. Chameleon supports natural repositioning on the surface without external hardware by combining two techniques: (1) Background Acquisition uses opportunistic sensing and language model-assisted pattern generation to blend with surrounding surfaces, and (2) Real-Time Position Tracking augments inertial sensing to maintain spatial stability. This work shows how lightweight sensing can support casual, collaborative AR experiences using existing devices.
Authors:Botao Amber Hu, Yilan Elan Tao, Rem RunGu Lin, Mingze Chai, Yuemin Huang, Rakesh Patibanda
Abstract:
Mixed Reality (MR) experiences increasingly explore how virtual elements can shape physical behaviour, yet how MR objects guide group movement remains underexplored. We address this gap by examining how virtual objects can nudge collective, co-located movement without relying on explicit instructions or choreography. We developed GravField, a co-located MR performance system where an "object jockey" live-configures virtual objects, springs, ropes, magnets, with real-time, parameterised "digital physics" (e.g., weight, elasticity, force) to influence the movement of headset-wearing participants. These properties were made perceptible through augmented visual and audio feedback, creating dynamic cognitive-somatic cues. Our analysis of the performances, based on video, interviews, soma trajectories, and field notes, indicates that these live nudges support emergent intercorporeal coordination and that ambiguity and real-time configuration sustain open-ended, exploratory engagement. Ultimately, our work offers empirical insights and design principles for MR systems that can guide group movement through embodied, felt dynamics.
Authors:Wael Albayaydh, Ivan Flechais, Rui Zhao, Jood Albayaydh
Abstract:
Privacy concerns and fears of unauthorized access in smart home devices often stem from misunderstandings about how data is collected, used, and protected. This study explores how AI-powered tools can offer innovative privacy protections through clear, personalized, and contextual support to users. Through 23 in-depth interviews with users, AI developers, designers, and regulators, and using Grounded Theory analysis, we identified two key themes: Aspirations for AI-Enhanced Privacy - how users perceive AI's potential to empower them, address power imbalances, and improve ease of use- and AI Ethical, Security, and Regulatory Considerations-challenges in strengthening data security, ensuring regulatory compliance, and promoting ethical AI practices. Our findings contribute to the field by uncovering user aspirations for AI-driven privacy solutions, identifying key security and ethical challenges, and providing actionable recommendations for all stakeholders, particularly targeting smart device designers and AI developers, to guide the co-design of AI tools that enhance privacy protection in smart home devices. By bridging the gap between user expectations, AI capabilities, and regulatory frameworks, this work offers practical insights for shaping the future of privacy-conscious AI integration in smart homes.
Authors:JooYoung Seo, Saairam Venkatesh, Daksh Pokar, Sanchita Kamath, Krishna Anandan Ganesan
Abstract:
Although recent efforts have developed accessible data visualization tools for blind and low-vision (BLV) users, most follow a "design for them" approach that creates an unintentional divide between sighted creators and BLV consumers. This unidirectional paradigm perpetuates a power dynamic where sighted creators produce non-visual content boundaries for BLV consumers to access. This paper proposes a bidirectional approach, "design for us," where both sighted and BLV collaborators can employ the same tool to create, interpret, and communicate data visualizations for each other. We introduce Py maidr, a Python package that seamlessly encodes multimodal (e.g., tactile, auditory, conversational) data representations into visual plots generated by Matplotlib and Seaborn. By simply importing the maidr package and invoking the maidr.show() method, users can generate accessible plots with minimal changes to their existing codebase regardless of their visual dis/abilities. Our technical case studies demonstrate how this tool is scalable and can be integrated into interactive computing (e.g., Jupyter Notebook, Google Colab), reproducible and literate programming (e.g., Quarto), and reactive dashboards (e.g., Shiny, Streamlit). Our performance benchmarks demonstrate that Py maidr introduces minimal and consistent overhead during the rendering and export of plots against Matplotlib and Seaborn baselines. This work significantly contributes to narrowing the accessibility gap in data visualization by providing a unified framework that fosters collaboration and communication between sighted and BLV individuals.
Authors:Yuan Xu, Shaowen Xiang, Yizhi Song, Ruoting Sun, Xin Tong
Abstract:
Large Language Models are reshaping task automation, yet remain limited in complex, multi-step real-world tasks that require aligning with vague user intent and enabling dynamic user override. From a formative study with 12 participants, we found that end-users actively seek to shape generative interfaces rather than relying on one-shot outputs. To address this, we introduce the human-agent co-generation paradigm, materialized in DuetUI. This LLM-empowered system unfolds alongside task progress through a bidirectional context loop--the agent scaffolds the interface by decomposing the task, while the user's direct manipulations implicitly steer the agent's next generation step. In a user study with 24 participants, DuetUI significantly improved task efficiency and interface usability compared to a baseline, fostering seamless human-agent collaboration. Our contributions include the proposal and validation of this novel paradigm, the design of the DuetUI prototype embodying it, and empirical insights into how this bidirectional loop better aligns agents with human intent.
Authors:Anshul Shah, Thomas Rexin, Elena Tomson, Leo Porter, William G. Griswold, Adalbert Gerald Soosai Raj
Abstract:
Motivation. Trust in generative AI programming assistants is a vital attitude that impacts how programmers use those programming assistants. Programmers that are over-trusting may be too reliant on their tools, leading to incorrect or vulnerable code; programmers that are under-trusting may avoid using tools that can improve their productivity and well-being. Methods. Since trust is a dynamic attitude that may change over time, this study aims to understand programmers' evolution of trust after immediate (one hour) and extended (10 days) use of GitHub Copilot. We collected survey data from 71 upper-division computer science students working on a legacy code base, representing a population that is about to enter the workforce. In this study, we quantitatively measure student trust levels and qualitatively uncover why student trust changes. Findings. Student trust, on average, increased over time. After completing a project with Copilot, however, students felt that Copilot requires a competent programmer to complete some tasks manually. Students mentioned that seeing Copilot's correctness, understanding how Copilot uses context from the code base, and learning some basics of natural language processing contributed to their elevated trust. Implications. Our study helps instructors and industry managers understand the factors that influence how students calibrate their trust with AI assistants. We make four pedagogical recommendations, which are that CS educators should 1) provide opportunities for students to work with Copilot on challenging software engineering tasks to calibrate their trust, 2) teach traditional skills of comprehending, debugging, and testing so students can verify output, 3) teach students about the basics of natural language processing, and 4) explicitly introduce and demonstrate the range of features available in Copilot.
Authors:Irina Bianca Serban, Dimitra Dritsa, David ten Cate, Loes Janssen, Margot Heijmans, Sara Colombo, Aarnout Brombacher, Steven Houben
Abstract:
Multimodal prehabilitation for colorectal cancer (CRC) surgery aims to optimize patient fitness and reduce postoperative complications. While telemonitoring's clinical value in supporting decision-making is recognized, patient perspectives on its use in prehabilitation remain underexplored, particularly compared to its related clinical context, rehabilitation. To address this gap, we conducted interviews with five patients who completed a four-week CRC prehabilitation program incorporating continuous telemonitoring. Our findings reveal patients' willingness to engage with telemonitoring, shaped by their motivations, perceived benefits, and concerns. We outline design considerations for patient-centered systems and offer a foundation for further research on telemonitoring in CRC prehabilitation.
Authors:Gülce KardeÅ, David Krakauer, Joshua Grochow
Abstract:
Cognitive science and theoretical computer science both seek to classify and explain the difficulty of tasks. Mechanisms of intelligence are those that reduce task difficulty. Here we map concepts from the computational complexity of a physical puzzle, the Soma Cube, onto cognitive problem-solving strategies through a ``Principle of Materiality''. By analyzing the puzzle's branching factor, measured through search tree outdegree, we quantitatively assess task difficulty and systematically examine how different strategies modify complexity. We incrementally refine a trial-and-error search by layering preprocessing (cognitive chunking), value ordering (cognitive free-sorting), variable ordering (cognitive scaffolding), and pruning (cognitive inference). We discuss how the competent use of artifacts reduces effective time complexity by exploiting physical constraints and propose a model of intelligence as a library of algorithms that recruit the capabilities of both mind and matter.
Authors:Karl Higley, Robin Burke, Michael D. Ekstrand, Bart P. Knijnenburg
Abstract:
One of the goals of recommender systems research is to provide insights and methods that can be used by practitioners to build real-world systems that deliver high-quality recommendations to actual people grounded in their genuine interests and needs. We report on our experience trying to apply the news recommendation literature to build POPROX, a live platform for news recommendation research, and reflect on the extent to which the current state of research supports system-building efforts. Our experience highlights several unexpected challenges encountered in building personalization features that are commonly found in products from news aggregators and publishers, and shows how those difficulties are connected to surprising gaps in the literature. Finally, we offer a set of lessons learned from building a live system with a persistent user base and highlight opportunities to make future news recommendation research more applicable and impactful in practice.
Authors:Hyeonggeun Yun, Jinkyu Jang
Abstract:
Although browser-using agents (BUAs) show promise for web tasks and automation, most BUAs terminate after executing a single instruction, failing to support users' complex, nonlinear browsing with ambiguous goals, iterative decision-making, and changing contexts. We present a human-in-the-loop (HITL) conceptual framework informed by theories of human web browsing behavior. The framework centers on an iterative loop in which the BUA proactively proposes next actions and the user steers the browsing process through feedback. It also distinguishes between exploration and exploitation actions, enabling users to control the breadth and depth of their browsing. Consequently, the framework aims to reduce users' physical and cognitive effort while preserving users' traditional browsing mental model and supporting users in achieving satisfactory outcomes. We illustrate how the framework operates with hypothetical use cases and discuss the shift from manual browsing to interaction-driven browsing. We contribute a theoretically informed conceptual framework for BUAs.
Authors:Alexandru Ternar, Alena Denisova, João M. Cunha, Annakaisa Kultima, Christian Guckelsberger
Abstract:
Generative Artificial Intelligence (GenAI) has had a tremendous impact on game production and promises lasting transformations. In the last five years since GenAI's inception, several studies, typically via qualitative methods, have explored its impact on game production from different settings and demographic angles. However, these studies often contextualise and consolidate their findings weakly with related work, and a big picture view is still missing. Here, we aim to provide such a view of GenAI's impact on game production in the form of a qualitative research synthesis via meta-ethnography. We followed PRISMA-S to systematically search the relevant literature from 2020-2025, including major HCI and games research databases. We then synthesised the 10 eligible studies, conducting reciprocal translation and line-of-argument synthesis guided by eMERGe, informed by CASP quality appraisal. We identified nine overarching themes, provide recommendations, and contextualise our insights in wider game production trends.
Authors:Gerlinde Emsenhuber, Tobias Langlotz, Denis Kalkofen, Markus Tatzgern
Abstract:
Image-based scene understanding allows Augmented Reality systems to provide contextual visual guidance in unprepared, real-world environments. While effective on video see-through (VST) head-mounted displays (HMDs), such methods suffer on optical see-through (OST) HMDs due to misregistration between the world-facing camera and the user's eye perspective. To approximate the user's true eye view, we implement and evaluate three software-based eye-perspective rendering (EPR) techniques on a commercially available, untethered OST HMD (Microsoft HoloLens 2): (1) Plane-Proxy EPR, projecting onto a fixed-distance plane; (2) Mesh-Proxy EPR, using SLAM-based reconstruction for projection; and (3) Gaze-Proxy EPR, a novel eye-tracking-based method that aligns the projection with the user's gaze depth. A user study on real-world tasks underscores the importance of accurate EPR and demonstrates gaze-proxy as a lightweight alternative to geometry-based methods. We release our EPR framework as open source.
Authors:Ziyi Wang, Ziwen Zeng, Yuan Li, Zijian Ding
Abstract:
Career exploration is uncertain, requiring decisions with limited information and unpredictable outcomes. While generative AI offers new opportunities for career guidance, most systems rely on linear chat interfaces that produce overly comprehensive and idealized suggestions, overlooking the non-linear and effortful nature of real-world trajectories. We present CareerPooler, a generative AI-powered system that employs a pool-table metaphor to simulate career development as a spatial and narrative interaction. Users strike balls representing milestones, skills, and random events, where hints, collisions, and rebounds embody decision-making under uncertainty. In a within-subjects study with 24 participants, CareerPooler significantly improved engagement, information gain, satisfaction, and career clarity compared to a chatbot baseline. Qualitative findings show that spatial-narrative interaction fosters experience-based learning, resilience through setbacks, and reduced psychological burden. Our findings contribute to the design of AI-assisted career exploration systems and more broadly suggest that visually grounded analogical interactions can make generative systems engaging and satisfying.
Authors:Zahra Borhani, Ali Ebrahimpour-Boroojeny, Francisco R. Ortega
Abstract:
Augmented Reality (AR) enables intuitive interaction with virtual annotations overlaid on the real world, supporting a wide range of applications such as remote assistance, education, and industrial training. However, as the number of heterogeneous annotations increases, their efficient retrieval remains an open challenge in 3D environments. This paper examines how interaction modalities and presentation designs affect user performance, workload, fatigue, and preference in AR annotation retrieval. In two user studies, we compare eye-gaze versus hand-ray hovering and evaluate four presentation methods: Opacity-based, Scale-based, Nothing-based, and Marker-based. Results show that eye-gaze was favored over hand-ray by users, despite leading to significantly higher unintentional activations. Among the presentation methods, Scale-based presentation reduces workload and task completion time while aligning with user preferences. Our findings offer empirical insights into the effectiveness of different annotation presentation methods, leading to design recommendations for building more efficient and user-friendly AR annotation review systems.
Authors:Sudhamshu Hosamane, Alyvia Walters, Yao Lyu, Shagun Jhaver
Abstract:
As social media platforms increasingly promote the use of user-enacted moderation tools (e.g., reporting, blocking, content filters) to address online harms, it becomes crucially important that such controls are usable for everyone. We evaluate the accessibility of these moderation tools on two mainstream platforms -- Facebook and X -- through interviews and task-based walkthroughs with 15 individuals with vision impairments. Adapting the lens of \emph{administrative burden of safety work}, we identify three interleaved costs that users with vision loss incur while interacting with moderation tools: \emph{learning costs} (understanding what controls do and where they live), \emph{compliance costs} (executing multi-step procedures under screen reader and low-vision conditions), and \emph{psychological costs} (experiencing uncertainty, stress, and diminished agency). Our analysis bridges the fields of content moderation and accessibility in HCI research and contributes (1) a cross-platform catalog of accessibility and usability breakdowns affecting safety tools; and (2) design recommendations for reducing this burden.
Authors:Zhuoyue Lyu, Per Ola Kristensson
Abstract:
Boundaries such as walls, windows, and doors are ubiquitous in the physical world, yet their potential in Mixed Reality (MR) remains underexplored. We present Unbounded, a Research through Design inquiry into Object-Boundary Interactions (OBIs). Building on prior work, we articulate a design space aimed at providing a shared language for OBIs. To demonstrate its potential, we design and implement eight examples across productivity and art exploration scenarios, showcasing how boundaries can enrich and reframe everyday interactions. We further engage with six MR experts in one-on-one feedback sessions, using the design space and examples as design probes. Their reflections broaden the conceptual scope of OBIs, reveal new possibilities for how the framework may be applied, and highlight implications for future MR interaction design.
Authors:Jie Li, Youyang Hou, Laura Lin, Ruihao Zhu, Hancheng Cao, Abdallah El Ali
Abstract:
Generative AI is reshaping UX design practices through "vibe coding," where UX professionals express intent in natural language and AI translates it into functional prototypes and code. Despite rapid adoption, little research has examined how vibe coding reconfigures UX workflows and collaboration. Drawing on interviews with 20 UX professionals across enterprises, startups, and academia, we show how vibe coding follows a four-stage workflow of ideation, AI generation, debugging, and review. This accelerates iteration, supports creativity, and lowers barriers to participation. However, professionals reported challenges of code unreliability, integration, and AI over-reliance. We find tensions between efficiency-driven prototyping ("intending the right design") and reflection ("designing the right intention"), introducing new asymmetries in trust, responsibility, and social stigma within teams. Through the lens of responsible human-AI collaboration for AI-assisted UX design and development, we contribute a deeper understanding of deskilling, ownership and disclosure, and creativity safeguarding in the age of vibe coding.
Authors:Mansi Sharma, Alexandre Duchevet, Florian Daiber, Jean-Paul Imbert, Maurice Rekrut
Abstract:
Unexpected events can impair attention and delay decision-making, posing serious safety risks in high-risk environments such as aviation. In particular, reactions like startle and surprise can impact pilot performance in different ways, yet are often hard to distinguish in practice. Existing research has largely studied these reactions separately, with limited focus on their combined effects or how to differentiate them using physiological data. In this work, we address this gap by distinguishing between startle and surprise events based on physiological signals using machine learning and multi-modal fusion strategies. Our results demonstrate that these events can be reliably predicted, achieving a highest mean accuracy of 85.7% with SVM and Late Fusion. To further validate the robustness of our model, we extended the evaluation to include a baseline condition, successfully differentiating between Startle, Surprise, and Baseline states with a highest mean accuracy of 74.9% with XGBoost and Late Fusion.
Authors:Amitabh Chakravorty, Jess Kropczynski, Nelly Elsayed
Abstract:
With the widespread adoption of cryptocurrencies, cryptojacking has become a significant security threat to crypto wallet users. This paper presents a front-end prototype of an AI-powered security dashboard, namely, CryptoGuard. Developed through a user-centered design process, the prototype was constructed as a high-fidelity, click-through model from Figma mockups to simulate key user interactions. It is designed to assist users in monitoring their login and transaction activity, identifying any suspicious behavior, and enabling them to take action directly within the wallet interface. The dashboard is designed for a general audience, prioritizing an intuitive user experience for non-technical individuals. Although its AI functionality is conceptual, the prototype demonstrates features like visual alerts and reporting. This work is positioned explicitly as a design concept, bridging cryptojacking detection research with human-centered interface design. This paper also demonstrates how usability heuristics can directly inform a tool's ability to support rapid and confident decision-making under real-world threats. This paper argues that practical security tools require not only robust backend functionality but also a user-centric design that communicates risk and empowers users to take meaningful action.
Authors:Racquel Fygenson, Lace Padilla, Enrico Bertini
Abstract:
Classically, affordance research investigates how the shape of objects communicates actions to potential users. Cognitive affordances, a subset of this research, characterize how the design of objects influences cognitive actions, such as information processing. Within visualization, cognitive affordances inform how graphs' design decisions communicate information to their readers. Although several related concepts exist in visualization, a formal translation of affordance theory to visualization is still lacking. In this paper, we review and translate affordance theory to visualization by formalizing how cognitive affordances operate within a visualization context. We also review common methods and terms, and compare related constructs to cognitive affordances in visualization. Based on a synthesis of research from psychology, human computer interaction, and visualization, we propose a framework of cognitive affordances in visualization that enumerates design decisions and reader characteristics that influence a visualization's hierarchy of communicated information. Finally, we demonstrate how this framework can guide the evaluation and redesign of visualizations.
Authors:Chang Liu, Loc Hoang, Andrew Stolman, Rene F. Kizilcec, Bo Wu
Abstract:
Providing students with flexible and timely academic support is a challenge at most colleges and universities, leaving many students without help outside scheduled hours. Large language models (LLMs) are promising for bridging this gap, but interactions between students and LLMs are rarely overseen by educators. We developed and studied an LLM-powered course assistant deployed across multiple computer science courses to characterize real-world use and understand pedagogical implications. By Spring 2024, our system had been deployed to approximately 2,000 students across six courses at three institutions. Analysis of the interaction data shows that usage remains strong in the evenings and nights and is higher in introductory courses, indicating that our system helps address temporal support gaps and novice learner needs. We sampled 200 conversations per course for manual annotation: most sampled responses were judged correct and helpful, with a small share unhelpful or erroneous; few responses included dedicated examples. We also examined an inquiry-based learning strategy: only around 11% of sampled conversations contained LLM-generated follow-up questions, which were often ignored by students in advanced courses. A Bloom's taxonomy analysis reveals that current LLM capabilities are limited in generating higher-order cognitive questions. These patterns suggest opportunities for pedagogically oriented LLM-based educational systems and greater educator involvement in configuring prompts, content, and policies.
Authors:Riccardo Bovo, Frederik Brudy, George Fitzmaurice, Fraser Anderson
Abstract:
Understanding transcripts of immersive multimodal conversations is challenging because speakers frequently rely on visual context and non-verbal cues, such as gestures and visual attention, which are not captured in speech alone. This lack of information makes coreferences resolution-the task of linking ambiguous expressions like ``it'' or ``there'' to their intended referents-particularly challenging. In this paper we present a system that augments VR speech transcript with eye-tracking laser pointing data, and scene metadata to generate textual descriptions of non-verbal communication and the corresponding objects of interest. To evaluate the system, we collected gaze, gesture, and voice data from 12 participants (6 pairs) engaged in an open-ended design critique of a 3D model of an apartment. Our results show a 26.5\% improvement in coreference resolution accuracy by a GPT model when using our multimodal transcript compared to a speech-only baseline.
Authors:Lena Cibulski, Fiete Haack, Adelinde Uhrmacher, Stefan Bruckner
Abstract:
The ability of a cell to communicate with its environment is essential for key cellular functions like replication, metabolism, or cell fate decisions. The involved molecular mechanisms are highly dynamic and difficult to capture experimentally. Simulation studies offer a valuable means for exploring and predicting how cell signaling processes unfold. We present a design study on the visual analysis of such studies to support 1) modelers in calibrating model parameters such that the simulated signal responses over time reflect reference behavior from cell biology research and 2) cell biologists in exploring the influence of receptor trafficking on the efficiency of signal transmission within the cell. We embed time series plots into parallel coordinates to enable a simultaneous analysis of model parameters and temporal outputs. A usage scenario illustrates how our approach assists with typical tasks such as assessing the plausibility of temporal outputs or their sensitivity across model configurations.
Authors:Can Liu, Shiwei Chen, Zhibang Jiang, Yong Wang
Abstract:
Expressive glyph visualizations provide a powerful and versatile means to represent complex multivariate data through compact visual encodings, but creating custom glyphs remains challenging due to the gap between design creativity and technical implementation. We present GlyphWeaver, a novel interactive system to enable an easy creation of expressive glyph visualizations. Our system comprises three key components: a glyph domain-specific language (GDSL), a GDSL operation management mechanism, and a multimodal interaction interface. The GDSL is a hierarchical container model, where each container is independent and composable, providing a rigorous yet practical foundation for complex glyph visualizations. The operation management mechanism restricts modifications of the GDSL to atomic operations, making it accessible without requiring direct coding. The multimodal interaction interface enables direct manipulation, natural language commands, and parameter adjustments. A multimodal large language model acts as a translator, converting these inputs into GDSL operations. GlyphWeaver significantly lowers the barrier for designers, who often do not have extensive programming skills, to create sophisticated glyph visualizations. A case study and user interviews with 13 participants confirm its substantial gains in design efficiency and effectiveness of producing creative glyph visualizations.
Authors:Ryan Lingo, Rajeev Chhajer, Martin Arroyo, Luka Brkljacic, Ben Davis, Nithin Santhanam
Abstract:
Large Language Models (LLMs) often produce monolithic text that is hard to edit in parts, which can slow down collaborative workflows. We present componentization, an approach that decomposes model outputs into modular, independently editable units while preserving context. We describe Modular and Adaptable Output Decomposition (MAOD), which segments responses into coherent components and maintains links among them, and we outline the Component-Based Response Architecture (CBRA) as one way to implement this idea. Our reference prototype, MAODchat, uses a microservices design with state-machine-based decomposition agents, vendor-agnostic model adapters, and real-time component manipulation with recomposition. In an exploratory study with four participants from academic, engineering, and product roles, we observed that component-level editing aligned with several common workflows and enabled iterative refinement and selective reuse. Participants also mentioned possible team workflows. Our contributions are: (1) a definition of componentization for transforming monolithic outputs into manipulable units, (2) CBRA and MAODchat as a prototype architecture, (3) preliminary observations from a small user study, (4) MAOD as an algorithmic sketch for semantic segmentation, and (5) example Agent-to-Agent protocols for automated decomposition. We view componentization as a promising direction for turning passive text consumption into more active, component-level collaboration.
Authors:Lan Xiao, Maryam Bandukda, Franklin Mingzhe Li, Mark Colley, Catherine Holloway
Abstract:
Video content creation offers vital opportunities for expression and participation, yet remains largely inaccessible to creators with sensory impairments, especially in low-resource settings. We conducted interviews with 20 video creators with visual and hearing impairments in Kenya to examine their tools, challenges, and collaborative practices. Our findings show that accessibility barriers and infrastructural limitations shape video creation as a staged, collaborative process involving trusted human partners and emerging AI tools. Across workflows, creators actively negotiated agency and trust, maintaining creative control while bridging sensory gaps. We discuss the need for flexible, interdependent collaboration models, inclusive human-AI workflows, and diverse storytelling practices. This work broadens accessibility research in HCI by examining how technology and social factors intersect in low-resource contexts, suggesting ways to better support disabled creators globally.
Authors:Juhoon Lee, Bich Ngoc Doan, Jonghyun Jee, Joseph Seering
Abstract:
Platforms are increasingly adopting industrial models of moderation that prioritize scalability and consistency, frequently at the expense of context-sensitive and user-centered values. Building on the multi-level governance framework that examines the interdependent relationship between platforms and middle-level communities, we investigate community appeals systems on Discord as a model for successful community-led governance. We investigate how Discord servers operationalize appeal systems through a qualitative interview study with focus groups and individual interviews with 17 community moderators. Our findings reveal a structured appeals process that balances scalability, fairness, and accountability while upholding community-centered values of growth and rehabilitation. Communities design these processes to empower users, ensuring their voices are heard in moderation decisions and fostering a sense of belonging. This research provides insights into the practical implementation of community-led governance in a multi-level governance framework, illustrating how communities can maintain their core principles while integrating procedural fairness and tool-based design. We discuss how platforms can gain insights from community-led moderation work to motivate governance structures that effectively balance and align the interests of multiple stakeholders.
Authors:Yannick Kalff, Katharina Simbeck
Abstract:
AI-based recommender systems increasingly influence recruitment decisions. Thus, transparency and responsible adoption in Human Resource Management (HRM) are critical. This study examines how HR managers' AI literacy influences their subjective perception and objective understanding of explainable AI (XAI) elements in recruiting recommender dashboards. In an online experiment, 410 German-based HR managers compared baseline dashboards to versions enriched with three XAI styles: important features, counterfactuals, and model criteria. Our results show that the dashboards used in practice do not explain AI results and even keep AI elements opaque. However, while adding XAI features improves subjective perceptions of helpfulness and trust among users with moderate or high AI literacy, it does not increase their objective understanding. It may even reduce accurate understanding, especially with complex explanations. Only overlays of important features significantly aided the interpretations of high-literacy users. Our findings highlight that the benefits of XAI in recruitment depend on users' AI literacy, emphasizing the need for tailored explanation strategies and targeted literacy training in HRM to ensure fair, transparent, and effective adoption of AI.
Authors:Ziling Chen, Yeo Jung Yoon, Rolando Bautista-Montesano, Zhen Zhao, Ajay Mandlekar, John Liu
Abstract:
Teleoperation offers a promising solution for enabling hands-on learning in remote education, particularly in environments requiring interaction with real-world equipment. However, such remote experiences can be costly or non-intuitive. To address these challenges, we present TeleopLab, a mobile device teleoperation system that allows students to control a robotic arm and operate lab equipment. TeleopLab comprises a robotic arm, an adaptive gripper, cameras, lab equipment for a diverse range of applications, a user interface accessible through smartphones, and video call software. We conducted a user study, focusing on task performance, students' perspectives toward the system, usability, and workload assessment. Our results demonstrate a 46.1% reduction in task completion time as users gained familiarity with the system. Quantitative feedback highlighted improvements in students' perspectives after using the system, while NASA TLX and SUS assessments indicated a manageable workload of 38.2 and a positive usability of 73.8. TeleopLab successfully bridges the gap between physical labs and remote education, offering a scalable and effective platform for remote STEM learning.
Authors:Jaroslaw Kornowicz, Maurice Pape, Kirsten Thommes
Abstract:
Prior research shows that social norms can reduce algorithm aversion, but little is known about how such norms become established. Most accounts emphasize technological and individual determinants, yet AI adoption unfolds within organizational social contexts shaped by peers and supervisors. We ask whether the source of the norm-peers or supervisors-shapes AI usage behavior. This question is practically relevant for organizations seeking to promote effective AI adoption. We conducted an online vignette experiment, complemented by qualitative data on participants' feelings and justifications after (counter-)normative behavior. In line with the theory, counter-normative choices elicited higher regret than norm-adherent choices. On average, choosing AI increased regret compared to choosing an human. This aversion was weaker when AI use was presented as the prevailing norm, indicating a statistically significant interaction between AI use and an AI-favoring norm. Participants also attributed less blame to technology than to humans, which increased regret when AI was chosen over human expertise. Both peer and supervisor influence emerged as relevant factors, though contrary to expectations they did not significantly affect regret. Our findings suggest that regret aversion, embedded in social norms, is a central mechanism driving imitation in AI-related decision-making.
Authors:Rodrigo Oliveira Zacarias, Rodrigo Pereira dos Santos, Patricia Lago
Abstract:
Software ecosystems (SECO) have become a dominant paradigm in the software industry, enabling third-party developers to co-create value through complementary components and services. While Developer Experience (DX) is increasingly recognized as critical for sustainable SECO, transparency remains an underexplored factor shaping how developers perceive and interact with ecosystems. Existing studies acknowledge transparency as essential for trust, fairness, and engagement, yet its relationship with DX has not been systematically conceptualized. Hence, this work aims to advance the understanding of transparency in SECO from a developer-centered perspective. To this end, we propose SECO-TransDX (Transparency in Software Ecosystems from a Developer Experience Perspective), a conceptual model that introduces the notion of DX-driven transparency. The model identifies 63 interrelated concepts, including conditioning factors, ecosystem procedures, artifacts, and relational dynamics that influence how transparency is perceived and constructed during developer interactions. SECO-TransDX was built upon prior research and refined through a Delphi study with experts from academia and industry. It offers a structured lens to examine how transparency mediates DX across technical, social, and organizational layers. For researchers, it lays the groundwork for future studies and tool development; for practitioners, it supports the design of trustworthy, developer-centered platforms that improve transparency and foster long-term engagement in SECO.
Authors:Mauro Diaz, Luis Sante, Joel Perca, João Victor da Silva, Nivan Ferreira, Jorge Poco
Abstract:
Effectively analyzing spatiotemporal data plays a central role in understanding real-world phenomena and informing decision-making. Capturing the interaction between spatial and temporal dimensions also helps explain the underlying structure of the data. However, most datasets do not reveal attribute relationships, requiring additional algorithms to extract meaningful patterns. Existing visualization tools often focus either on attribute relationships or spatiotemporal analysis, but rarely support both simultaneously. In this paper, we present STRive (SpatioTemporal Rule Interactive Visual Explorer), a visual analytics system that enables users to uncover and explore spatial and temporal patterns in data. At the core of STRive lies Association Rule Mining (ARM), which we apply to spatiotemporal datasets to generate interpretable and actionable insights. We combine ARM with multiple interactive mechanisms to analyze the extracted relationships. Association rules serve as interpretable guidance mechanisms for visual analytics by highlighting the meaningful aspects of the data that users should investigate. Our methodology includes three key steps: rule generation, rule clustering, and interactive visualization. STRive offers two modes of analysis. The first operates at the rule cluster level and includes four coordinated views, each showing a different facet of a cluster, including its temporal and spatial behavior. The second mode mirrors the first but focuses on individual rules within a selected cluster. We evaluate the effectiveness of STRive through two case studies involving real-world datasets -- fatal vehicle accidents and urban crime. Results demonstrate the system's ability to support the discovery and analysis of interpretable patterns in complex spatiotemporal contexts.
Authors:Caterina Fuster-Barcelo, Gonzalo R. Rios-Munoz, Arrate Munoz-Barrutia
Abstract:
This study examines the integration of digital collaborative tools and structured peer evaluation in the Machine Learning for Health master's program, through the redesign of a Biomedical Image Processing course over two academic years. The pedagogical framework combines real-time programming with Google Colab, experiment tracking and reporting via Weights & Biases, and rubric-guided peer assessment to foster student engagement, transparency, and fair evaluation. Compared to a pre-intervention cohort, the two implementation years showed increased grade dispersion and higher entropy in final project scores, suggesting improved differentiation and fairness in assessment. The survey results further indicate greater student engagement with the subject and their own learning process. These findings highlight the potential of integrating tool-supported collaboration and structured evaluation mechanisms to enhance both learning outcomes and equity in STEM education.
Authors:Hajnal Gyeviki, Mihály Minkó, Mary Karyda, Damla Ãay
Abstract:
Balaton Borders translates ecological data from Lake Balaton into ceramic tableware that represents human impact on the landscape, from reedbed reduction to shoreline modification and land erosion. Designed for performative dining, the pieces turn shared meals into multisensory encounters where food and data ceramics spark collective reflection on ecological disruption.
Authors:Sharjeel Tahir, Judith Johnson, Jumana Abu-Khalaf, Syed Afaq Ali Shah
Abstract:
A prevalent shortfall among current empathic AI systems is their inability to recognize when verbal expressions may not fully reflect underlying emotional states. This is because the existing datasets, used for the training of these systems, focus on surface-level emotion recognition without addressing the complex verbal-visual incongruence (mismatch) patterns useful for empathic understanding. In this paper, we present E-THER, the first Person-Centered Therapy-grounded multimodal dataset with multidimensional annotations for verbal-visual incongruence detection, enabling training of AI systems that develop genuine rather than performative empathic capabilities. The annotations included in the dataset are drawn from humanistic approach, i.e., identifying verbal-visual emotional misalignment in client-counsellor interactions - forming a framework for training and evaluating AI on empathy tasks. Additional engagement scores provide behavioral annotations for research applications. Notable gains in empathic and therapeutic conversational qualities are observed in state-of-the-art vision-language models (VLMs), such as IDEFICS and VideoLLAVA, using evaluation metrics grounded in empathic and therapeutic principles. Empirical findings indicate that our incongruence-trained models outperform general-purpose models in critical traits, such as sustaining therapeutic engagement, minimizing artificial or exaggerated linguistic patterns, and maintaining fidelity to PCT theoretical framework.
Authors:Shreyas Tirumala, Nishant Jain, Danny D. Leybzon, Trent D. Buskirk
Abstract:
Transformer-based Large Language Models (LLMs) have paved the way for "AI interviewers" that can administer voice-based surveys with respondents in real-time. This position paper reviews emerging evidence to understand when such AI interviewing systems are fit for purpose for collecting data within quantitative and qualitative research contexts. We evaluate the capabilities of AI interviewers as well as current Interactive Voice Response (IVR) systems across two dimensions: input/output performance (i.e., speech recognition, answer recording, emotion handling) and verbal reasoning (i.e., ability to probe, clarify, and handle branching logic). Field studies suggest that AI interviewers already exceed IVR capabilities for both quantitative and qualitative data collection, but real-time transcription error rates, limited emotion detection abilities, and uneven follow-up quality indicate that the utility, use and adoption of current AI interviewer technology may be context-dependent for qualitative data collection efforts.
Authors:Himanshu Verma, Kirtan Padh, Eva Thelisson
Abstract:
Auditability is defined as the capacity of AI systems to be independently assessed for compliance with ethical, legal, and technical standards throughout their lifecycle. The chapter explores how auditability is being formalized through emerging regulatory frameworks, such as the EU AI Act, which mandate documentation, risk assessments, and governance structures. It analyzes the diverse challenges facing AI auditability, including technical opacity, inconsistent documentation practices, lack of standardized audit tools and metrics, and conflicting principles within existing responsible AI frameworks. The discussion highlights the need for clear guidelines, harmonized international regulations, and robust socio-technical methodologies to operationalize auditability at scale. The chapter concludes by emphasizing the importance of multi-stakeholder collaboration and auditor empowerment in building an effective AI audit ecosystem. It argues that auditability must be embedded in AI development practices and governance infrastructures to ensure that AI systems are not only functional but also ethically and legally aligned.
Authors:Simon Burbach, Maria Maleshkova, Florian Centler, Tanja Joan Schmidt
Abstract:
Microbiomes are a vital part of the human body, engaging in tasks like food digestion and immune defense. Their structure and function must be understood in order to promote host health and facilitate swift recovery during disease. Due to the difficulties in experimentally studying these systems in situ, more research is being conducted in the field of mathematical modeling. Visualizing spatiotemporal data is challenging, and current tools that simulate microbial communities' spatial and temporal development often only provide limited functionalities, often requiring expert knowledge to generate useful results. To overcome these limitations, we provide a user-friendly tool to interactively explore spatiotemporal simulation data, called MicroLabVR, which transfers spatial data into virtual reality (VR) while following guidelines to enhance user experience (UX). With MicroLabVR, users can import CSV datasets containing population growth, substance concentration development, and metabolic flux distribution data. The implemented visualization methods allow users to evaluate the dataset in a VR environment interactively. MicroLabVR aims to improve data analysis for the user by allowing the exploration of microbiome data in their spatial context.
Authors:Imran S. A. Khan, Emmanuel G. Blanchard, Sébastien George
Abstract:
This paper introduces the Future Atmospheric Conditions Training System (FACTS), a novel platform that advances climate resilience education through place-based, adaptive learning experiences. FACTS combines real-time atmospheric data collected by IoT sensors with curated resources from a Knowledge Base to dynamically generate localized learning challenges. Learner responses are analyzed by a Generative AI powered server, which delivers personalized feedback and adaptive support. Results from a user evaluation indicate that participants found the system both easy to use and effective for building knowledge related to climate resilience. These findings suggest that integrating IoT and Generative AI into atmospherically adaptive learning technologies holds significant promise for enhancing educational engagement and fostering climate awareness.
Authors:Alekhya Gandu, Aakash Gautam
Abstract:
Community-based design efforts rightly seek to reduce the power differences between researchers and community participants by aligning with community values and furthering their priorities. However, what should designers do when key community members' practices seem to enact an oppressive and harmful structure? We reflect on our two-year-long engagement with a non-profit organization in southern India that supports women subjected to domestic abuse or facing mental health crises. We highlight the organizational gaps in knowledge management and transfer, which became an avenue for our design intervention. During design, we encountered practices that upheld caste hierarchies. These practices were expected to be incorporated into our technology. Anticipating harms to indirect stakeholders, we resisted this incorporation. It led to a breakdown in our relationship with the partner organization. Reflecting on this experience, we outline pluralistic pathways that community-based designers might inhabit when navigating value conflicts. These include making space for reflection before and during engagements, strategically repositioning through role reframing or appreciative inquiry, and exiting the engagement if necessary.
Authors:Lev Tankelevitch, Elena L. Glassman, Jessica He, Aniket Kittur, Mina Lee, Srishti Palani, Advait Sarkar, Gonzalo Ramos, Yvonne Rogers, Hari Subramonyam
Abstract:
Generative AI (GenAI) radically expands the scope and capability of automation for work, education, and everyday tasks, a transformation posing both risks and opportunities for human cognition. How will human cognition change, and what opportunities are there for GenAI to augment it? Which theories, metrics, and other tools are needed to address these questions? The CHI 2025 workshop on Tools for Thought aimed to bridge an emerging science of how the use of GenAI affects human thought, from metacognition to critical thinking, memory, and creativity, with an emerging design practice for building GenAI tools that both protect and augment human thought. Fifty-six researchers, designers, and thinkers from across disciplines as well as industry and academia, along with 34 papers and portfolios, seeded a day of discussion, ideation, and community-building. We synthesize this material here to begin mapping the space of research and design opportunities and to catalyze a multidisciplinary community around this pressing area of research.
Authors:Yibo Wang, Yuhan Luo, Janghee Cho, Junnan Yu
Abstract:
Recent advancements in geographic information systems and mixed reality technologies have positioned spatial computing as a transformative paradigm in computational science. However, the field remains conceptually fragmented, with diverse interpretations across disciplines like Human-Computer Interaction, Geographic Information Science, and Computer Science, which hinders a comprehensive understanding of spatial computing and poses challenges for its coherent advancement and interdisciplinary integration. In this paper, we trace the origins and historical evolution of spatial computing and examine how "spatial" is understood, identifying two schools of thought: "spatial" as the contextual understanding of space, where spatial data guides interaction in the physical world; and "spatial" as a mixed space for interaction, emphasizing the seamless integration of physical and digital environments to enable embodied engagement. By synthesizing these perspectives, we propose spatial computing as a computational paradigm that redefines the interplay between environment, computation, and human experience, offering a holistic lens to enhance its conceptual clarity and inspire future technological innovations that support meaningful interactions with and shaping of environments.
Authors:Jeremy Zhengqi Huang, Caluã de Lacerda Pataca, Liang-Yuan Wu, Dhruv Jain
Abstract:
Non-speech captions are essential to the video experience of deaf and hard of hearing (DHH) viewers, yet conventional approaches often overlook the diversity of their preferences. We present CapTune, a system that enables customization of non-speech captions based on DHH viewers' needs while preserving creator intent. CapTune allows caption authors to define safe transformation spaces using concrete examples and empowers viewers to personalize captions across four dimensions: level of detail, expressiveness, sound representation method, and genre alignment. Evaluations with seven caption creators and twelve DHH participants showed that CapTune supported creators' creative control while enhancing viewers' emotional engagement with content. Our findings also reveal trade-offs between information richness and cognitive load, tensions between interpretive and descriptive representations of sound, and the context-dependent nature of caption preferences.
Authors:Martin Benderoth, Patrick Gebhard, Christian Keller, C. Benjamin Nakhosteen, Stefan Schaffer, Tanja Schneeberger
Abstract:
This paper introduces a novel approach to tackle the challenges of preserving and transferring tacit knowledge--deep, experience-based insights that are hard to articulate but vital for decision-making, innovation, and problem-solving. Traditional methods rely heavily on human facilitators, which, while effective, are resource-intensive and lack scalability. A promising alternative is the use of Socially Interactive Agents (SIAs) as AI-driven knowledge transfer facilitators. These agents interact autonomously and socially intelligently with users through multimodal behaviors (verbal, paraverbal, nonverbal), simulating expert roles in various organizational contexts. SIAs engage employees in empathic, natural-language dialogues, helping them externalize insights that might otherwise remain unspoken. Their success hinges on building trust, as employees are often hesitant to share tacit knowledge without assurance of confidentiality and appreciation. Key technologies include Large Language Models (LLMs) for generating context-relevant dialogue, Retrieval-Augmented Generation (RAG) to integrate organizational knowledge, and Chain-of-Thought (CoT) prompting to guide structured reflection. These enable SIAs to actively elicit knowledge, uncover implicit assumptions, and connect insights to broader organizational contexts. Potential applications span onboarding, where SIAs support personalized guidance and introductions, and knowledge retention, where they conduct structured interviews with retiring experts to capture heuristics behind decisions. Success depends on addressing ethical and operational challenges such as data privacy, algorithmic bias, and resistance to AI. Transparency, robust validation, and a culture of trust are essential to mitigate these risks.
Authors:Paluck Deep, Monica Bharadhidasan, A. Baki Kocaballi
Abstract:
Personas have been widely used to understand and communicate user needs in human-centred design. Despite their utility, they may fail to meet the demands of iterative workflows due to their static nature, limited engagement, and inability to adapt to evolving design needs. Recent advances in large language models (LLMs) pave the way for more engaging and adaptive approaches to user representation. This paper introduces Interactive Virtual Personas (IVPs): multimodal, LLM-driven, conversational user simulations that designers can interview, brainstorm with, and gather feedback from in real time via voice interface. We conducted a qualitative study with eight professional UX designers, employing an IVP named "Alice" across three design activities: user research, ideation, and prototype evaluation. Our findings demonstrate the potential of IVPs to expedite information gathering, inspire design solutions, and provide rapid user-like feedback. However, designers raised concerns about biases, over-optimism, the challenge of ensuring authenticity without real stakeholder input, and the inability of the IVP to fully replicate the nuances of human interaction. Our participants emphasised that IVPs should be viewed as a complement to, not a replacement for, real user engagement. We discuss strategies for prompt engineering, human-in-the-loop integration, and ethical considerations for effective and responsible IVP use in design. Finally, our work contributes to the growing body of research on generative AI in the design process by providing insights into UX designers' experiences of LLM-powered interactive personas.
Authors:Ruhan Yang, Ellen Yi-Luen Do
Abstract:
We reviewed 43 papers to understand the fabrication of dynamic paper-based interactions. We used a design space to classify tool selection, technique choice, and exploration of paper as a material. We classified 9 dimensions for the design space, including 4 dimensions for tools (precision, accommodation, complexity, and availability), 3 dimensions for techniques (cutting techniques, folding techniques, and integration techniques), and 2 dimensions for paper as the material (paper weight and paper type). The patterns we observed in the design space indicate a majority use of high precision tools, high complexity tools, and surface integration techniques in previous practice. Meanwhile, printing and plain paper are the leading material choices. We analyze these patterns and suggest potential directions for future work. Our study helps researchers locate different fabrication approaches and instances, thus fostering innovation in the field of paper-based interaction.
Authors:Youngwon Choi, Donghyuk Jung, Hwayeon Kim
Abstract:
We present DESAMO, an on-device smart home system for elder-friendly use powered by Audio LLM, that supports natural and private interactions. While conventional voice assistants rely on ASR-based pipelines or ASR-LLM cascades, often struggling with the unclear speech common among elderly users and unable to handle non-speech audio, DESAMO leverages an Audio LLM to process raw audio input directly, enabling a robust understanding of user intent and critical events, such as falls or calls for help.
Authors:Laurie Gale, Sue Sentance
Abstract:
Debugging is often a challenging and infuriating experience for secondary school students learning their first text-based programming language. Many students resort to ineffective debugging strategies, making success with solving errors unlikely and emotional distress common. Developing tools that encourage students to adopt a more systematic and reflective approach to debugging is therefore an important, but lacking, area of research. This paper presents PRIMMDebug, a debugging teaching aid for secondary school students learning text-based programming. The aid consists of an online tool that takes students through the steps of a systematic debugging process based on PRIMM, a framework for teaching programming. The tool promotes a reflective approach to debugging by heavily encouraging students to articulate their thoughts throughout the PRIMMDebug process while simultaneously limiting their ability to run and edit code. To evaluate the tool, a set of students from four secondary schools were taught with PRIMMDebug over several lessons. Survey results and log data analysis show that students were generally reluctant to engage with the systematicity and reflection that the tool encourages. Given that related work on systematic debugging has reported similar challenges, we end by considering how these approaches could be refined to help more students benefit from them.
Authors:Ryan Hare, Ying Tang
Abstract:
One of the enduring challenges in education is how to empower students to take ownership of their learning by setting meaningful goals, tracking their progress, and adapting their strategies when faced with setbacks. Research has shown that this form of leaner-centered learning is best cultivated through structured, supportive environments that promote guided practice, scaffolded inquiry, and collaborative dialogue. In response, educational efforts have increasingly embraced artificial-intelligence (AI)-powered digital learning environments, ranging from educational apps and virtual labs to serious games. Recent advances in large language models (LLMs) and neuro-symbolic systems, meanwhile, offer a transformative opportunity to reimagine how support is delivered in digital learning environments. LLMs are enabling socially interactive learning experiences and scalable, cross-domain learning support that can adapt instructional strategies across varied subjects and contexts. In parallel, neuro-symbolic AI provides new avenues for designing these agents that are not only adaptive but also scalable across domains. Based on these remarks, this paper presents a multi-agent, neuro-symbolic framework designed to resolve the aforementioned challenges. The framework assigns distinct pedagogical roles to specialized agents: an RL-based 'tutor' agent provides authoritative, non-verbal scaffolding, while a proactive, LLM-powered 'peer' agent facilitates the social dimensions of learning. While prior work has explored such agents in isolation, our framework's novelty lies in unifying them through a central educational ontology. Through case studies in both college-level and middle school settings, we demonstrate the framework's adaptability across domains. We conclude by outlining key insights and future directions for advancing AI-driven learning environments.
Authors:Meir Nizri, Amos Azaria, Chirag Gupta, Noam Hazon
Abstract:
Calibration has been proposed as a way to enhance the reliability and adoption of machine learning classifiers. We study a particular aspect of this proposal: how does calibrating a classification model affect the decisions made by non-expert humans consuming the model's predictions? We perform a Human-Computer-Interaction (HCI) experiment to ascertain the effect of calibration on (i) trust in the model, and (ii) the correlation between decisions and predictions. We also propose further corrections to the reported calibrated scores based on Kahneman and Tversky's prospect theory from behavioral economics, and study the effect of these corrections on trust and decision-making. We find that calibration is not sufficient on its own; the prospect theory correction is crucial for increasing the correlation between human decisions and the model's predictions. While this increased correlation suggests higher trust in the model, responses to ``Do you trust the model more?" are unaffected by the method used.
Authors:Tailon D. Jackson, Byunggu Yu
Abstract:
This thesis investigates whether large language models (LLMs) can be guided to simulate a consistent personality through prompt engineering. The study explores this concept within the context of a chatbot designed for Speech-Language Pathology (SLP) student training, specifically focused on gender-affirming voice therapy. The chatbot, named Monae Jackson, was created to represent a 32-year-old transgender woman and engage in conversations simulating client-therapist interactions. Findings suggest that with prompt engineering, the chatbot maintained a recognizable and consistent persona and had a distinct personality based on the Big Five Personality test. These results support the idea that prompt engineering can be used to simulate stable personality characteristics in AI chatbots.
Authors:Neo Christopher Chung, Jakub Binda
Abstract:
Deep learning has transformed computer vision (CV), achieving outstanding performance in classification, segmentation, and related tasks. Such AI-based CV systems are becoming prevalent, with applications spanning from medical imaging to surveillance. State of the art models such as convolutional neural networks (CNNs) and vision transformers (ViTs) are often regarded as ``black boxes,'' offering limited transparency into their decision-making processes. Despite a recent advancement in explainable AI (XAI), explainability remains underutilized in practical CV deployments. A primary obstacle is the absence of integrated software solutions that connect XAI techniques with robust knowledge management and monitoring frameworks. To close this gap, we have developed Obz AI, a comprehensive software ecosystem designed to facilitate state-of-the-art explainability and observability for vision AI systems. Obz AI provides a seamless integration pipeline, from a Python client library to a full-stack analytics dashboard. With Obz AI, a machine learning engineer can easily incorporate advanced XAI methodologies, extract and analyze features for outlier detection, and continuously monitor AI models in real time. By making the decision-making mechanisms of deep models interpretable, Obz AI promotes observability and responsible deployment of computer vision systems.
Authors:Sana Athar, Devashish Gosain, Anja Feldmann, Mannat Kaur, Ha Dao
Abstract:
With the rapid increase in online interactions, concerns over data privacy and transparency of data processing practices have become more pronounced. While regulations like the GDPR have driven the widespread adoption of cookie banners in the EU, India's Digital Personal Data Protection Act (DPDPA) promises similar changes domestically, aiming to introduce a framework for data protection. However, certain clauses within the DPDPA raise concerns about potential infringements on user privacy, given the exemptions for government accountability and user consent requirements. In this study, for the first time, we explore Indian Internet users' awareness and perceptions of cookie banners, online privacy, and privacy regulations, especially in light of the newly passed DPDPA. We conducted an online anonymous survey with 428 Indian participants, which addressed: (1) users' perspectives on cookie banners, (2) their attitudes towards online privacy and privacy regulations, and (3) their acceptance of 10 contentious DPDPA clauses that favor state authorities and may enable surveillance. Our findings reveal that privacy-conscious users often lack consistent awareness of privacy mechanisms, and their concerns do not always lead to protective actions. Our thematic analysis of 143 open ended responses shows that users' privacy and data protection concerns are rooted in skepticism towards the government, shaping their perceptions of the DPDPA and fueling demands for policy revisions. Our study highlights the need for clearer communication regarding the DPDPA, user-centric consent mechanisms, and policy refinements to enhance data privacy practices in India.
Authors:Zhongyi Bai, Jens Emil Grønbæk, Andrew Irlitti, Jarrod Knibbe, Eduardo Velloso
Abstract:
We propose and explore the user experience of SEAM -- Stand-in Enhanced Asynchronous Meetings -- virtual reality meetings in which embodied virtual agents represent absent users. During the meeting, attendees can address the agent, and the absent user can later watch the recording from its perspective to respond. Through two mixed-method studies with 45 participants using the Wizard-of-Oz approach, we explored both the perspectives of the attendees in the original meeting and of the absent users later re-watching the meeting. We found that the stand-in can enhance meetings, benefiting both present and absent collaborators. Present attendees can easily access information that drives decision-making in the meeting perceive high social presence of absentees. Absentees also felt included when watching recordings because of the social interactions and attention towards them. Our contributions demonstrate a proof of concept for future asynchronous meetings in which collaborators can interact conversationally more akin to how they would if it had been synchronous.
Authors:Surat Teerakapibal, Poompak Kusawat
Abstract:
Since its recent debut, ChatGPT has become a global sensation and significantly impacted the field of education. Both educational researchers and practitioners have identified opportunities as well as risks associated with the use of this novel tool in educational settings. Despite the ongoing debate, there is still no research exploring occupational differences in the perception of ChatGPT in education. In this paper, we analyzed Twitter data using topic modeling and sentiment analysis to investigate how ChatGPT is perceived and discussed differently in different occupations. Our study found diverse topics discussed including its use in schools, impact on exams, academic integrity concerns, and response accuracy evaluations. While most tweets were positive or neutral, concerns about integrity and response accuracy were evident. Analysis revealed sentiment and topic variations among users' occupations. These findings emphasize the opportunities and challenges of integrating ChatGPT in education, necessitating continued monitoring and informed policy-making for responsible utilization.
Authors:Trung Hieu Pham, Chanh Minh Tran, Eiji Kamioka, Xuan Tan Phan
Abstract:
Creating immersive 3D visual experiences typically requires expensive and specialized hardware such as VR headsets, autostereoscopic displays, or active shutter glasses. These constraints limit the accessibility and everyday use of 3D visualization technologies in resource-constrained settings. To address this, we propose a low-cost system that enables real-time 3D light-field viewing using only a standard 2D monitor, a conventional RGB webcam, and red-cyan anaglyph glasses. The system integrates real-time eye-tracking to dynamically adapt the displayed light-field image to the user's head position with a lightweight rendering pipeline that selects and composites stereoscopic views from pre-captured light-field data. The resulting anaglyph image is updated in real-time, creating a more immersive and responsive 3D experience. The system operates entirely on CPU and maintains a stable frame rate of 30 FPS, confirming its feasibility on typical consumer-grade hardware. All of these highlight the potential of our approach as an accessible platform for interactive 3D applications in education, digital media, and beyond.
Authors:Xu Pan, Jingxuan Fan, Zidi Xiong, Ely Hahami, Jorin Overwiening, Ziqian Xie
Abstract:
Large language models (LLMs) can bias towards relying on their own or the user's information in chat history, leading to overly stubborn or agreeable behaviors in multi-turn conversations. In this paper, we formalize this model characteristic as user-assistant bias and introduce an 8k multi-turn conversation dataset $\textbf{UserAssist}$, which we use to benchmark, understand and manipulate the user-assistant bias in frontier LLMs. Leveraging $\textbf{UserAssist-test}$, we first benchmark the user-assistant bias of 26 commercial and 26 open-weight models. Commercial models show various levels of user bias. Evaluation on open-weight models reveals significant user bias in the instruction-tuned models, and weak user bias in reasoning (or reasoning-distilled) models. We then perform controlled fine-tuning experiments to pinpoint the post-training recipe contributing to these bias shifts: human preference alignment increases user bias, while training on chain-of-thought reasoning traces decreases it. Finally, we demonstrate that user-assistant bias can be bidirectionally adjusted by performing direct preference optimization (DPO) on $\textbf{UserAssist-train}$, and generalizes well to both in-domain and out-of-domain conversations. Our results provide insights into how the LLM integrates information from different sources, and also a viable way to detect and control model abnormalities.
Authors:Alaul Islam, Fairouz Grioui, Raimund Dachselt, Petra Isenberg
Abstract:
We present the results of an in-situ ideation workshop for designing data visualizations on smart wristbands that can show data around the entire wrist of a wearer. Wristbands pose interesting challenges because the visibility of different areas of the band depends on the wearer's arm posture. We focused on four usage scenarios that lead to different postures: office work, leisurely walks, cycling, and driving. As the technology for smart wristbands is not yet commercially available, we conducted a paper-based ideation exercise that showed how spatial layout and visualization design on smart wristbands may need to vary depending on the types of data items of interest and arm postures. Participants expressed a strong preference for responsive visualization designs that could adapt to the movement of wearers' arms. Supplemental material from the study is available here: https://osf.io/4hrca/.
Authors:Wen-Fan Wang, Ting-Ying Lee, Chien-Ting Lu, Che-Wei Hsu, Nil Ponsa CampanyÃ, Yu Chen, Mike Y. Chen, Bing-Yu Chen
Abstract:
Environment designers in the entertainment industry create imaginative 2D and 3D scenes for games, films, and television, requiring both fine-grained control of specific details and consistent global coherence. Designers have increasingly integrated generative AI into their workflows, often relying on large language models (LLMs) to expand user prompts for text-to-image generation, then iteratively refining those prompts and applying inpainting. However, our formative study with 10 designers surfaced two key challenges: (1) the lengthy LLM-generated prompts make it difficult to understand and isolate the keywords that must be revised for specific visual elements; and (2) while inpainting supports localized edits, it can struggle with global consistency and correctness. Based on these insights, we present GenTune, an approach that enhances human--AI collaboration by clarifying how AI-generated prompts map to image content. Our GenTune system lets designers select any element in a generated image, trace it back to the corresponding prompt labels, and revise those labels to guide precise yet globally consistent image refinement. In a summative study with 20 designers, GenTune significantly improved prompt--image comprehension, refinement quality, and efficiency, and overall satisfaction (all $p < .01$) compared to current practice. A follow-up field study with two studios further demonstrated its effectiveness in real-world settings.
Authors:Huizhong Cao, Henrik Söderlund, Qi Fang, Siyuan Chen, Lejla Erdal, Ammar Gubartalla, Paulo Victor Lopes, Guodong Shao, Per Lonnehed, Henri Putto, Abbe Ahmed, Sven Ekered, Björn Johansson
Abstract:
Since the introduction of Industry 4.0, digital twin technology has significantly evolved, laying the groundwork for a transition toward Industry 5.0 principles centered on human-centricity, sustainability, and resilience. Through digital twins, real-time connected production systems are anticipated to be more efficient, resilient, and sustainable, facilitating communication and connectivity between digital and physical systems. However, environmental performance and integration with virtual reality (VR) and artificial intelligence (AI) of such systems remain challenging. Further exploration of digital twin technologies is needed to validate the real-world impact and benefits. This paper investigates these challenges by implementing a real-time digital twin based on the ISO 23247 standard, connecting the physical factory and simulation software with VR capabilities. This digital twin system provides cognitive assistance and a user-friendly interface for operators, thereby improving cognitive ergonomics. The connection of the Internet of Things (IoT) platform allows the digital twin to have real-time bidirectional communication, collaboration, monitoring, and assistance. A lab-scale drone factory was used as the digital twin application to test and evaluate the ISO 23247 standard and its potential benefits. Additionally, AI integration and environmental performance Key Performance Indicators (KPIs) have been considered as the next stages in improving VR-integrated digital twins. With a solid theoretical foundation and a demonstration of the VR-integrated digital twins, this paper addresses integration issues between various technologies and advances the framework of digital twins based on ISO 23247.
Authors:Sydney Thompson, Kate Candon, Marynel Vázquez
Abstract:
The Human-Robot Interaction (HRI) community often highlights the social context of an interaction as a key consideration when designing, implementing, and evaluating robot behavior. Unfortunately, researchers use the term "social context" in varied ways. This can lead to miscommunication, making it challenging to draw connections between related work on understanding and modeling the social contexts of human-robot interactions. To address this gap, we survey the HRI literature for existing definitions and uses of the term "social context". Then, we propose a conceptual model for describing the social context of a human-robot interaction. We apply this model to existing work, and we discuss a range of attributes of social contexts that can help researchers plan for interactions, develop behavior models for robots, and gain insights after interactions have taken place. We conclude with a discussion of open research questions in relation to understanding and modeling the social contexts of human-robot interactions.
Authors:Oliver Hein, Sandra Wackerl, Changkun Ou, Florian Alt, Francesco Chiossi
Abstract:
Many exergames face challenges in keeping users within safe and effective intensity levels during exercise. Meanwhile, although wearable devices continuously collect physiological data, this information is seldom leveraged for real-time adaptation or to encourage user reflection. We designed and evaluated a VR cycling simulator that dynamically adapts based on users' heart rate zones. First, we conducted a user study (N=50) comparing eight visualization designs to enhance engagement and exertion control, finding that gamified elements like non-player characters (NPCs) were promising for feedback delivery. Based on these findings, we implemented a physiology-adaptive exergame that adjusts visual feedback to keep users within their target heart rate zones. A lab study (N=18) showed that our system has potential to help users maintain their target heart rate zones. Subjective ratings of exertion, enjoyment, and motivation remained largely unchanged between conditions. Our findings suggest that real-time physiological adaptation through NPC visualizations can improve workout regulation in exergaming.
Authors:Zhongqin Wang, J. Andrew Zhang, Kai Wu, Min Xu, Y. Jay Guo
Abstract:
Integrated Sensing and Communication (ISAC) is a key enabler for next-generation wireless systems. However, real-world deployment is often limited to low-cost, single-antenna transceivers. In such bistatic Single-Input Single-Output (SISO) setup, clock asynchrony introduces random phase offsets in Channel State Information (CSI), which cannot be mitigated using conventional multi-antenna methods. This work proposes WiDFS 3.0, a lightweight bistatic SISO sensing framework that enables accurate delay and Doppler estimation from distorted CSI by effectively suppressing Doppler mirroring ambiguity. It operates with only a single antenna at both the transmitter and receiver, making it suitable for low-complexity deployments. We propose a self-referencing cross-correlation (SRCC) method for SISO random phase removal and employ delay-domain beamforming to resolve Doppler ambiguity. The resulting unambiguous delay-Doppler-time features enable robust sensing with compact neural networks. Extensive experiments show that WiDFS 3.0 achieves accurate parameter estimation, with performance comparable to or even surpassing that of prior multi-antenna methods, especially in delay estimation. Validated under single- and multi-target scenarios, the extracted ambiguity-resolved features show strong sensing accuracy and generalization. For example, when deployed on the embedded-friendly MobileViT-XXS with only 1.3M parameters, WiDFS 3.0 consistently outperforms conventional features such as CSI amplitude, mirrored Doppler, and multi-receiver aggregated Doppler.
Authors:Long Ling, Xinyi Chen, Ruoyu Wen, Toby Jia-Jun Li, Ray LC
Abstract:
Character design in games involves interdisciplinary collaborations, typically between designers who create the narrative content, and illustrators who realize the design vision. However, traditional workflows face challenges in communication due to the differing backgrounds of illustrators and designers, the latter with limited artistic abilities. To overcome these challenges, we created Sketchar, a Generative AI (GenAI) tool that allows designers to prototype game characters and generate images based on conceptual input, providing visual outcomes that can give immediate feedback and enhance communication with illustrators' next step in the design cycle. We conducted a mixed-method study to evaluate the interaction between game designers and Sketchar. We showed that the reference images generated in co-creating with Sketchar fostered refinement of design details and can be incorporated into real-world workflows. Moreover, designers without artistic backgrounds found the Sketchar workflow to be more expressive and worthwhile. This research demonstrates the potential of GenAI in enhancing interdisciplinary collaboration in the game industry, enabling designers to interact beyond their own limited expertise.
Authors:Wenqing Wang, Yun Fu
Abstract:
Emotion is a critical component of artificial social intelligence. However, while current methods excel in lip synchronization and image quality, they often fail to generate accurate and controllable emotional expressions while preserving the subject's identity. To address this challenge, we introduce RealTalk, a novel framework for synthesizing emotional talking heads with high emotion accuracy, enhanced emotion controllability, and robust identity preservation. RealTalk employs a variational autoencoder (VAE) to generate 3D facial landmarks from driving audio, which are concatenated with emotion-label embeddings using a ResNet-based landmark deformation model (LDM) to produce emotional landmarks. These landmarks and facial blendshape coefficients jointly condition a novel tri-plane attention Neural Radiance Field (NeRF) to synthesize highly realistic emotional talking heads. Extensive experiments demonstrate that RealTalk outperforms existing methods in emotion accuracy, controllability, and identity preservation, advancing the development of socially intelligent AI systems.
Authors:Justin Phillips, Daniel Roggen, Cathy Speed, Robert Harle
Abstract:
Cardio Load, introduced by Google in 2024, is a measure of cardiovascular work (also known as training load) resulting from all the user's activities across the day. It is based on heart rate reserve and captures both activity intensity and duration. Thanks to feedback from users and internal research, we introduce adaptive and personalized targets which will be set weekly. This feature will be available in the Public Preview of the Fitbit app after September 2025. This white paper provides a comprehensive overview of Cardio Load (CL) and how weekly CL targets are established, with examples shown to illustrate the effect of varying CL on the weekly target. We compare Cardio Load and Active Zone Minutes (AZMs), highlighting their distinct purposes, i.e. AZMs for health guidelines and CL for performance measurement. We highlight that CL is accumulated both during active workouts and incidental daily activities, so users are able top-up their CL score with small bouts of activity across the day.
Authors:Songqin Nong, Jingxuan Xu, Sheng Zhou, Jianfeng Chen, Xiaoxuan Tang, Tao Jiang, Wenhao Xu
Abstract:
As autonomous agents become adept at understanding and interacting with graphical user interface (GUI) environments, a new era of automated task execution is emerging. Recent studies have demonstrated that Reinforcement Learning (RL) can effectively enhance agents' performance in dynamic interactive GUI environments. However, these methods face two key limitations: (1) they overlook the significant variation in difficulty across different GUI tasks by treating the entire training data as a uniform set, which hampers the agent's ability to adapt its learning process; and (2) most approaches collapse task-specific nuances into a single, coarse reward, leaving the agent with a uniform signal that yields inefficient policy updates. To address these limitations, we propose CRAFT-GUI, a curriculum learning framework based on Group Relative Policy Optimization (GRPO) that explicitly accounts for the varying difficulty across trajectories. To enable more fine-grained policy optimization, we design a reward function that combines simple rule-based signals with model-judged evaluation, providing richer and more nuanced feedback during training. Experimental results demonstrate that our method achieves significant improvements over previous state-of-the-art approaches, outperforming them by 5.6% on public benchmarks Android Control and 10.3% on our internal online benchmarks, respectively. These findings empirically validate the effectiveness of integrating reinforcement learning with curriculum learning in GUI interaction tasks.
Authors:Liming Xu, Dave Towey, Andrew P. French, Steve Benford
Abstract:
The increasing ubiquity of smartphones and resurgence of VR/AR techniques, it is expected that our everyday environment may soon be decorating with objects connecting with virtual elements. Alerting to the presence of these objects is therefore the first step for motivating follow-up further inspection and triggering digital material attached to the objects. This work studies a special kind of these objects -- Artcodes -- a human-meaningful and machine-readable decorative markers that camouflage themselves with freeform appearance by encoding information into their topology. We formulate this problem of recongising the presence of Artcodes as Artcode proposal detection, a distinct computer vision task that classifies topologically similar but geometrically and semantically different objects as a same class. To deal with this problem, we propose a new feature descriptor, called the shape of orientation histogram, to describe the generic topological structure of an Artcode. We collect datasets and conduct comprehensive experiments to evaluate the performance of the Artcode detection proposer built upon this new feature vector. Our experimental results show the feasibility of the proposed feature vector for representing topological structures and the effectiveness of the system for detecting Artcode proposals. Although this work is an initial attempt to develop a feature-based system for detecting topological objects like Artcodes, it would open up new interaction opportunities and spark potential applications of topological object detection.
Authors:Chidera W. Amazu, Joseph Mietkiewicz, Ammar N. Abbas, Gabriele Baldissone, Davide Fissore, Micaela Demichela, Anders L. Madsen, Maria Chiara Leva
Abstract:
Data from psychophysiological measures can offer new insight into control room operators' behaviour, cognition, and mental workload status. This can be particularly helpful when combined with appraisal of capacity to respond to possible critical plant conditions (i.e. critical alarms response scenarios). However, wearable physiological measurement tools such as eye tracking and EEG caps can be perceived as intrusive and not suitable for usage in daily operations. Therefore, this article examines the potential of using real-time data from process and operator-system interactions during abnormal scenarios that can be recorded and retrieved from the distributed control system's historian or process log, and their capacity to provide insight into operator behavior and predict their response outcomes, without intruding on daily tasks. Data for this study were obtained from a design of experiment using a formaldehyde production plant simulator and four human-in-the-loop experimental support configurations. A comparison between the different configurations in terms of both behaviour and performance is presented in this paper. A step-wise logistic regression and a Bayesian network models were used to achieve this objective. The results identified some predictive metrics and the paper discuss their value as precursor or predictor of overall system performance in alarm response scenarios. Knowledge of relevant and predictive behavioural metrics accessible in real time can better equip decision-makers to predict outcomes and provide timely support measures for operators.
Authors:Paul Schreiber, Beyza Cinar, Lennart Mackert, Maria Maleshkova
Abstract:
Human-Computer Interaction (HCI) is a multi-modal, interdisciplinary field focused on designing, studying, and improving the interactions between people and computer systems. This involves the design of systems that can recognize, interpret, and respond to human emotions or stress. Developing systems to monitor and react to stressful events can help prevent severe health implications caused by long-term stress exposure. Currently, the publicly available datasets and standardized protocols for data collection in this domain are limited. Therefore, we introduce a multi-modal dataset intended for wearable affective computing research, specifically the development of automated stress recognition systems. We systematically review the publicly available datasets recorded in controlled laboratory settings. Based on a proposed framework for the standardization of stress experiments and data collection, we collect physiological and motion signals from wearable devices (e.g., electrodermal activity, photoplethysmography, three-axis accelerometer). During the experimental protocol, we differentiate between the following four affective/activity states: neutral, physical, cognitive stress, and socio-evaluative stress. These different phases are meticulously labeled, allowing for detailed analysis and reconstruction of each experiment. Meta-data such as body positions, locations, and rest phases are included as further annotations. In addition, we collect psychological self-assessments after each stressor to evaluate subjects' affective states. The contributions of this paper are twofold: 1) a novel multi-modal, publicly available dataset for automated stress recognition, and 2) a benchmark for stress detection with 89\% in a binary classification (baseline vs. stress) and 82\% in a multi-class classification (baseline vs. stress vs. physical exercise).
Authors:Yiannos Demetriou, Manasvi Parikh, Sara Eskandari, Westley Weimer, Madeline Endres
Abstract:
Background: Spatial reasoning has been identified as a critical skill for success in STEM. Unfortunately, under-represented groups often have lower incoming spatial ability. Courses that improve spatial skills exist but are not widely used. Virtual reality (VR) has been suggested as a possible tool for teaching spatial reasoning since students are more accurate and complete spatial tasks more quickly in three dimensions. However, no prior work has developed or evaluated a fully-structured VR spatial skills course. Objectives: We seek to assess the effectiveness of teaching spatial reasoning in VR, both in isolation as a structured training curriculum and also in comparison to traditional methods. Methods: We adapted three modules of an existing pencil-and-paper course to VR, leveraging educational scaffolding and real-time feedback in the design. We evaluated our three-week course in a study with $n=24$ undergraduate introductory STEM students, capturing both quantitative spatial ability gains (using pre- and post test scores on validated assessments) and qualitative insights (from a post-study questionnaire). We also compared our VR course to an offering of a baseline non-VR course (using data collected in a previous study). Results and Conclusions: Students who took our VR course had significant spatial ability gains. Critically, we find no significant difference in outcomes between our VR course (3 meetings of 120 minutes each) and a baseline pencil and paper course (10 meetings of 90 minutes each), suggesting that spatial reasoning can be very efficiently taught in VR. We observed cybersickness at lower rates than are generally reported and most students reported enjoying learning in VR.
Authors:Daniel Raffini, Agnese Macori, Marco Angelini, Tiziana Catarci
Abstract:
The paper explores the study of gender-based narrative biases in stories generated by ChatGPT, Gemini, and Claude. The prompt design draws on Propp's character classifications and Freytag's narrative structure. The stories are analyzed through a close reading approach, with particular attention to adherence to the prompt, gender distribution of characters, physical and psychological descriptions, actions, and finally, plot development and character relationships. The results reveal the persistence of biases - especially implicit ones - in the generated stories and highlight the importance of assessing biases at multiple levels using an interpretative approach.
Authors:Daniel Raffini, Agnese Macori, Lorenzo Porcaro, Tiziana Catarci, Marco Angelini
Abstract:
This study examines the rhetorical and linguistic features of argumentative texts generated by ChatGPT on ethically nuanced topics and investigates their persuasive impact on human readers.Through a user study involving 62 participants and pre-post interaction surveys, the paper analyzes how exposure to AI-generated arguments affects opinion change and user perception. A linguistic and rhetorical analysis of the generated texts reveals a consistent argumentative macrostructure, reliance on formulaic expressions, and limited stylistic richness. While ChatGPT demonstrates proficiency in constructing coherent argumentative texts, its persuasive efficacy appears constrained, particularly on topics involving ethical issues.The study finds that while participants often acknowledge the benefits highlighted by ChatGPT, ethical concerns tend to persist or even intensify post-interaction. The results also demonstrate a variation depending on the topic. These findings highlight new insights on AI-generated persuasion in ethically sensitive domains and are a basis for future research.
Authors:Michael Fennel, Markus Walker, Dominik Pikos, Uwe D. Hanebeck
Abstract:
Research in virtual reality and haptic technologies has consistently aimed to enhance immersion. While advanced head-mounted displays are now commercially available, kinesthetic haptic interfaces still face challenges such as limited workspaces, insufficient degrees of freedom, and kinematics not matching the human arm. In this paper, we present HapticGiant, a novel large-scale kinesthetic haptic interface designed to match the properties of the human arm as closely as possible and to facilitate natural user locomotion while providing full haptic feedback. The interface incorporates a novel admittance-type force control scheme, leveraging hierarchical optimization to render both arbitrary serial kinematic chains and Cartesian admittances. Notably, the proposed control scheme natively accounts for system limitations, including joint and Cartesian constraints, as well as singularities. Experimental results demonstrate the effectiveness of HapticGiant and its control scheme, paving the way for highly immersive virtual reality applications.
Authors:Von Ralph Dane Marquez Herbuela, Yukie Nagai
Abstract:
Many individuals especially those with autism spectrum disorder (ASD), alexithymia, or other neurodivergent profiles face challenges in recognizing, expressing, or interpreting emotions. To support more inclusive and personalized emotion technologies, we present a real-time multimodal emotion estimation system that combines neurophysiological EEG, ECG, blood volume pulse (BVP), and galvanic skin response (GSR/EDA) and behavioral modalities (facial expressions, and speech) in a unified arousal-valence 2D interface to track moment-to-moment emotional states. This architecture enables interpretable, user-specific analysis and supports applications in emotion education, neuroadaptive feedback, and interaction support for neurodiverse users. Two demonstration scenarios illustrate its application: (1) passive media viewing (2D or VR videos) reveals cortical and autonomic responses to affective content, and (2) semi-scripted conversations with a facilitator or virtual agent capture real-time facial and vocal expressions. These tasks enable controlled and naturalistic emotion monitoring, making the system well-suited for personalized feedback and neurodiversity-informed interaction design.
Authors:Jonathan C. Roberts, Peter Butcher, Panagiotis D. Ritsos
Abstract:
This paper explores the use of scenario-based visualisation examples as a pedagogical strategy for teaching students the complexities of data insight, representation, and interpretation. Teaching data visualisation often involves explaining intricate issues related to data management and the challenges of presenting data meaningfully. In this work, we present a series of data-driven scenarios. These concise stories depict specific situations, and are created to help the educators highlight key concerns in data communication, such as chart selection, temporal versus categorical comparison, visual bias, and narrative framing. By grounding these examples in real-world contexts, students are encouraged to critically assess not only what the data shows, but how and why it is shown that way. The paper presents a collection of example scenarios, that educators can use for their own lessons; the work fits with a larger project on looking at critical thinking in the classroom, and developing appropriate tools. We also start to abstract principles, from our approach, so that others can develop their own scenarios for their teaching. Our approach aligns with principles of authentic and scenario-based learning, using real-world contexts to foster critical engagement with data.
Authors:Anaëlle Beignon, Thomas Thibault, Nolwenn Maudet
Abstract:
Generative AI is being massively deployed in digital services, at a scale that will result in significant environmental harm. We document how tech companies are transforming established user interfaces to impose AI use and show how and to what extent these strategies fit within established deceptive pattern categories. We identify two main design strategies that are implemented to impose AI use in both personal and professional contexts: imposing AI features in interfaces at the expense of existing non-AI features and promoting narratives about AI that make it harder to resist using it. We discuss opportunities for regulating the imposed adoption of AI features, which would inevitably lead to negative environmental effects.
Authors:Ruolin wang, Xingyu Liu, Biao Wang, Wayne Zhang, Ziqian Liao, Ziwen Li, Amy Pavel, Xiang 'Anthony' Chen
Abstract:
The rapid growth of online video content has outpaced efforts to make visual information accessible to blind and low vision (BLV) audiences. While professional Audio Description (AD) remains the gold standard, it is costly and difficult to scale across the vast volume of online media. In this work, we explore a complementary approach to broaden participation in video accessibility: engaging everyday video viewers at their watching and commenting time. We introduce CoSight, a Chrome extension that augments YouTube with lightweight, in-situ nudges to support descriptive commenting. Drawing from Fogg's Behavior Model, CoSight provides visual indicators of accessibility gaps, pop-up hints for what to describe, reminders to clarify vague comments, and related captions and comments as references. In an exploratory study with 48 sighted users, CoSight helped integrate accessibility contribution into natural viewing and commenting practices, resulting in 89% of comments including grounded visual descriptions. Follow-up interviews with four BLV viewers and four professional AD writers suggest that while such comments do not match the rigor of professional AD, they can offer complementary value by conveying visual context and emotional nuance for understanding the videos.
Authors:Jon E. Froehlich, Alexander Fiannaca, Nimer Jaber, Victor Tsaran, Shaun Kane
Abstract:
Interactive streetscape mapping tools such as Google Street View (GSV) and Meta Mapillary enable users to virtually navigate and experience real-world environments via immersive 360° imagery but remain fundamentally inaccessible to blind users. We introduce StreetReaderAI, the first-ever accessible street view tool, which combines context-aware, multimodal AI, accessible navigation controls, and conversational speech. With StreetReaderAI, blind users can virtually examine destinations, engage in open-world exploration, or virtually tour any of the over 220 billion images and 100+ countries where GSV is deployed. We iteratively designed StreetReaderAI with a mixed-visual ability team and performed an evaluation with eleven blind users. Our findings demonstrate the value of an accessible street view in supporting POI investigations and remote route planning. We close by enumerating key guidelines for future work.
Authors:Krisha Mehta, Gordon Kindlmann, Alex Kale
Abstract:
Visualizing data often entails data transformations that can reveal and hide information, operations we dub disclosure tactics. Whether designers hide information intentionally or as an implicit consequence of other design choices, tools and frameworks for visualization offer little explicit guidance on disclosure. To systematically characterize how visualizations can limit access to an underlying dataset, we contribute a content analysis of 425 examples of visualization techniques sampled from academic papers in the visualization literature, resulting in a taxonomy of disclosure tactics. Our taxonomy organizes disclosure tactics based on how they change the data representation underlying a chart, providing a systematic way to reason about design trade-offs in terms of what information is revealed, distorted, or hidden. We demonstrate the benefits of using our taxonomy by showing how it can guide reasoning in design scenarios where disclosure is a first-order consideration. Adopting disclosure as a framework for visualization research offers new perspective on authoring tools, literacy, uncertainty communication, personalization, and ethical design.
Authors:Jakub Binda, Valentina Paneta, Vasileios Eleftheriadis, Hongkyou Chung, Panagiotis Papadimitroulas, Neo Christopher Chung
Abstract:
Generative AI holds great potentials to automate and enhance data synthesis in nuclear medicine. However, the high-stakes nature of biomedical imaging necessitates robust mechanisms to detect and manage unexpected or erroneous model behavior. We introduce development and implementation of a hybrid anomaly detection framework to safeguard GenAI models in BIOEMTECH's eyes(TM) systems. Two applications are demonstrated: Pose2Xray, which generates synthetic X-rays from photographic mouse images, and DosimetrEYE, which estimates 3D radiation dose maps from 2D SPECT/CT scans. In both cases, our outlier detection (OD) enhances reliability, reduces manual oversight, and supports real-time quality control. This approach strengthens the industrial viability of GenAI in preclinical settings by increasing robustness, scalability, and regulatory compliance.
Authors:Syed Ibrahim Mustafa Shah Bukhari, Maha Sajid, Bo Ji, Brendan David-John
Abstract:
As Extended Reality (XR) devices become increasingly prevalent in everyday settings, they raise significant privacy concerns for bystanders: individuals in the vicinity of an XR device during its use, whom the device sensors may accidentally capture. Current privacy indicators, such as small LEDs, often presume that bystanders are attentive enough to interpret the privacy signals. However, these cues can be easily overlooked when bystanders are distracted or have limited vision. We define such individuals as situationally impaired bystanders. This study explores XR privacy indicator designs that are effective for situationally impaired bystanders. A focus group with eight participants was conducted to design five novel privacy indicators. We evaluated these designs through a user study with seven additional participants. Our results show that visual-only indicators, typical in commercial XR devices, received low ratings for perceived usefulness in impairment scenarios. In contrast, multimodal indicators were preferred in privacy-sensitive scenarios with situationally impaired bystanders. Ultimately, our results highlight the need to move toward adaptable, multimodal, and situationally aware designs that effectively support bystander privacy in everyday XR environments.
Authors:Nels Numan, Jessica Van Brummelen, Ziwen Lu, Anthony Steed
Abstract:
Site-specific outdoor AR experiences are typically authored using static 3D models, but are deployed in physical environments that change over time. As a result, virtual content may become misaligned with its intended real-world referents, degrading user experience and compromising contextual interpretation. We present AdjustAR, a system that supports in-situ correction of AR content in dynamic environments using multimodal large language models (MLLMs). Given a composite image comprising the originally authored view and the current live user view from the same perspective, an MLLM detects contextual misalignments and proposes revised 2D placements for affected AR elements. These corrections are backprojected into 3D space to update the scene at runtime. By leveraging MLLMs for visual-semantic reasoning, this approach enables automated runtime corrections to maintain alignment with the authored intent as real-world target environments evolve.
Authors:Amy Rae Fox, Michelle Morgenstern, Graham M. Jones, Arvind Satyanarayan
Abstract:
What impressions might readers form with visualizations that go beyond the data they encode? In this paper, we build on recent work that demonstrates the socio-indexical function of visualization, showing that visualizations communicate more than the data they explicitly encode. Bridging this with prior work examining public discourse about visualizations, we contribute an analytic framework for describing inferences about an artifact's social provenance. Via a series of attribution-elicitation surveys, we offer descriptive evidence that these social inferences: (1) can be studied asynchronously, (2) are not unique to a particular sociocultural group or a function of limited data literacy, and (3) may influence assessments of trust. Further, we demonstrate (4) how design features act in concert with the topic and underlying messages of an artifact's data to give rise to such 'beyond-data' readings. We conclude by discussing the design and research implications of inferences about social provenance, and why we believe broadening the scope of research on human factors in visualization to include sociocultural phenomena can yield actionable design recommendations to address urgent challenges in public data communication.
Authors:Pyeonghwa Kim, Steve Sawyer, Michael Dunn
Abstract:
We advance gender-inclusive research within the CSCW field by investigating the long-term gendered experiences of online freelancers on digital labor platforms. The prevalence of gender-based inequalities has attracted significant attention within the CSCW community. Yet, insights remain limited on how these inequalities shape workers' long-term experiences on digital labor platforms. Through a five-year longitudinal study of 105 freelancers on Upwork, we reveal persistent gender disparities that influence workers' long-term work and career trajectories, raising concerns about the sustainability of platform-mediated work. We advance the ongoing dialogue on gender inclusivity in the community by introducing the concepts of career disempowerment and platform-mediated motherhood penalty and by offering research and design implications for CSCW to foster more sustainable, equitable platform work environments for all genders.
Authors:Michelle Morgenstern, Amy Rae Fox, Graham M. Jones, Arvind Satyanarayan
Abstract:
In contemporary information ecologies saturated with misinformation, disinformation, and a distrust of science itself, public data communication faces significant hurdles. Although visualization research has broadened criteria for effective design, governing paradigms privilege the accurate and efficient transmission of data. Drawing on theory from linguistic anthropology, we argue that such approaches-focused on encoding and decoding propositional content-cannot fully account for how people engage with visualizations and why particular visualizations might invite adversarial or receptive responses. In this paper, we present evidence that data visualizations communicate not only semantic, propositional meaning$\unicode{x2013}$meaning about data$\unicode{x2013}$but also social, indexical meaning$\unicode{x2013}$meaning beyond data. From a series of ethnographically-informed interviews, we document how readers make rich and varied assessments of a visualization's "vibes"$\unicode{x2013}$inferences about the social provenance of a visualization based on its design features. Furthermore, these social attributions have the power to influence reception, as readers' decisions about how to engage with a visualization concern not only content, or even aesthetic appeal, but also their sense of alignment or disalignment with the entities they imagine to be involved in its production and circulation. We argue these inferences hinge on a function of human sign systems that has thus far been little studied in data visualization: socio-indexicality, whereby the formal features (rather than the content) of communication evoke social contexts, identities, and characteristics. Demonstrating the presence and significance of this socio-indexical function in visualization, this paper offers both a conceptual foundation and practical intervention for troubleshooting breakdowns in public data communication.
Authors:Kojiro Tanaka, Keiichi Sato, Masahiko Mikawa, Makoto Fujisawa
Abstract:
Recent research has focused on incorporating media into living environments via color-controlled materials and image display. In particular, grass-based displays have drawn attention as landscape-friendly interactive interfaces. To develop the grass display, it is important to obtain the grass color change characteristics that depend on the real environment. However, conventional methods require experiments on actual equipment every time the lighting or viewpoint changes, which is time-consuming and costly. Although research has begun on simulating grass colors, this approach still faces significant issues as it takes many hours for a single measurement. In this paper, we explore an interactive simulation of a grass display color change characteristic based on real-world conditions in a virtual environment. We evaluated our method's accuracy by simulating grass color characteristics across multiple viewpoints and environments, and then compared the results against prior work. The results indicated that our method tended to simulate the grass color characteristics similar to the actual characteristics and showed the potential to do so more quickly and with comparable accuracy to the previous study.
Authors:Kathy Cheng, Alison Olechowski, Shurui Zhou
Abstract:
In today's landscape, hardware development teams face increasing demands for better quality products, greater innovation, and shorter manufacturing lead times. Despite the need for more efficient and effective processes, hardware designers continue to struggle with a lack of awareness of design changes and other collaborators' actions, a persistent issue in decades of CSCW research. One significant and unaddressed challenge is understanding and managing dependencies between 3D CAD (computer-aided design) models, especially when products can contain thousands of interconnected components. In this two-phase formative study, we explore designers' pain points of CAD dependency management through a thematic analysis of 100 online forum discussions and semi-structured interviews with 10 designers. We identify nine key challenges related to the traceability, navigation, and consistency of CAD dependencies, that harm the effective coordination of hardware development teams. To address these challenges, we propose design goals and necessary features to enhance hardware designers' awareness and management of dependencies, ultimately with the goal of improving collaborative workflows.
Authors:Ahmad Farooq, Kamran Iqbal
Abstract:
As artificial intelligence (AI) and robotics increasingly permeate society, ensuring the ethical behavior of these systems has become paramount. This paper contends that transparency in AI decision-making processes is fundamental to developing trustworthy and ethically aligned robotic systems. We explore how transparency facilitates accountability, enables informed consent, and supports the debugging of ethical algorithms. The paper outlines technical, ethical, and practical challenges in implementing transparency and proposes novel approaches to enhance it, including standardized metrics, explainable AI techniques, and user-friendly interfaces. This paper introduces a framework that connects technical implementation with ethical considerations in robotic systems, focusing on the specific challenges of achieving transparency in dynamic, real-world contexts. We analyze how prioritizing transparency can impact public trust, regulatory policies, and avenues for future research. By positioning transparency as a fundamental element in ethical AI system design, we aim to add to the ongoing discussion on responsible AI and robotics, providing direction for future advancements in this vital field.
Authors:Jonathan C. Roberts, Hanan Alnjar, Aron E. Owen, Panagiotis D. Ritsos
Abstract:
We present the Critical Design Strategy (CDS) - a structured method designed to facilitate the examination of visualisation designs through reflection and critical thought. The CDS helps designers think critically and make informed improvements using heuristic evaluation. When developing a visual tool or pioneering a novel visualisation approach, identifying areas for enhancement can be challenging. Critical thinking is particularly crucial for visualisation designers and tool developers, especially those new to the field, such as studying visualisation in higher education. The CDS consists of three stages across six perspectives: Stage 1 captures the essence of the idea by assigning an indicative title and selecting five adjectives (from twenty options) to form initial impressions of the design. Stage 2 involves an in-depth critique using 30 heuristic questions spanning six key perspectives - user, environment, interface, components, design, and visual marks. Stage 3 focuses on synthesising insights, reflecting on design decisions, and determining the next steps forward. We introduce the CDS and explore its use across three visualisation modules in both undergraduate and postgraduate courses. Our longstanding experience with the CDS has allowed us to refine and develop it over time: from its initial creation through workshops in 2017/18 to improvements in wording and the development of two applications by 2020, followed by the expansion of support notes and refinement of heuristics through 2023; while using it in our teaching each year. This sustained use allows us to reflect on its practical application and offer guidance on how others can incorporate it into their own work.
Authors:Abdulrhman Alorini, Yufeng Wu, Abdullah Bin Sawad, Mukesh Prasad, A. Baki Kocaballi
Abstract:
Smart speakers are increasingly integrated into domestic life worldwide, yet their privacy risks remain underexplored in non-Western cultural contexts. This study investigates how Saudi Arabian users of smart speakers navigate privacy concerns within collectivist, gendered, and often multigenerational households. Using cultural probes followed by semi-structured interviews with 16 participants, we uncover everyday privacy-protective behaviours including unplugging devices, muting microphones, and avoiding voice interactions altogether. These practices are shaped not only by individual risk perceptions but also by household norms, room configurations, and interpersonal dynamics. We contribute empirical insights from an underrepresented region, theoretical extensions to contextual integrity frameworks, and design directions for culturally responsive voice interfaces. This work expands the global conversation on smart speaker privacy and informs more inclusive HCI practices in increasingly diverse smart home environments.
Authors:Angela Locoro, Silvia Golia, Davide Falessi
Abstract:
The underspecification of progressive levels of difficulty in measurement constructs design and assessment tests for data visualization literacy may hinder the expressivity of measurements in both test design and test reuse. To mitigate this problem, this paper proposes DRIVE-T (Discriminating and Representative Items for Validating Expressive Tests), a methodology designed to drive the construction and evaluation of assessment items. Given a data vizualization, DRIVE-T supports the identification of task-based items discriminability and representativeness for measuring levels of data visualization literacy. DRIVE-T consists of three steps: (1) tagging task-based items associated with a set of data vizualizations; (2) rating them by independent raters for their difficulty; (3) analysing raters' raw scores through a Many-Facet Rasch Measurement model. In this way, we can observe the emergence of difficulty levels of the measurement construct, derived from the discriminability and representativeness of task-based items for each data vizualization, ordered into Many-Facets construct levels. In this study, we show and apply each step of the methodology to an item bank, which models the difficulty levels of a measurement construct approximating a latent construct for data visualization literacy. This measurement construct is drawn from semiotics, i.e., based on the syntax, semantics and pragmatics knowledge that each data visualization may require to be mastered by people. The DRIVE-T methodology operationalises an inductive approach, observable in a post-design phase of the items preparation, for formative-style and practice-based measurement construct emergence. A pilot study with items selected through the application of DRIVE-T is also presented to test our approach.
Authors:Zhuohao Jerry Zhang, Haichang Li, Chun Meng Yu, Faraz Faruqi, Junan Xie, Gene S-H Kim, Mingming Fan, Angus G. Forbes, Jacob O. Wobbrock, Anhong Guo, Liang He
Abstract:
Building 3-D models is challenging for blind and low-vision (BLV) users due to the inherent complexity of 3-D models and the lack of support for non-visual interaction in existing tools. To address this issue, we introduce A11yShape, a novel system designed to help BLV users who possess basic programming skills understand, modify, and iterate on 3-D models. A11yShape leverages LLMs and integrates with OpenSCAD, a popular open-source editor that generates 3-D models from code. Key functionalities of A11yShape include accessible descriptions of 3-D models, version control to track changes in models and code, and a hierarchical representation of model components. Most importantly, A11yShape employs a cross-representation highlighting mechanism to synchronize semantic selections across all model representations -- code, semantic hierarchy, AI description, and 3-D rendering. We conducted a multi-session user study with four BLV programmers, where, after an initial tutorial session, participants independently completed 12 distinct models across two testing sessions, achieving results that aligned with their own satisfaction. The result demonstrates that participants were able to comprehend provided 3-D models, as well as independently create and modify 3-D models -- tasks that were previously impossible without assistance from sighted individuals.
Authors:Michael D. Ekstrand, Afsaneh Razi, Aleksandra Sarcevic, Maria Soledad Pera, Robin Burke, Katherine Landau Wright
Abstract:
Recommender systems are usually designed by engineers, researchers, designers, and other members of development teams. These systems are then evaluated based on goals set by the aforementioned teams and other business units of the platforms operating the recommender systems. This design approach emphasizes the designers' vision for how the system can best serve the interests of users, providers, businesses, and other stakeholders. Although designers may be well-informed about user needs through user experience and market research, they are still the arbiters of the system's design and evaluation, with other stakeholders' interests less emphasized in user-centered design and evaluation. When extended to recommender systems for social good, this approach results in systems that reflect the social objectives as envisioned by the designers and evaluated as the designers understand them. Instead, social goals and operationalizations should be developed through participatory and democratic processes that are accountable to their stakeholders. We argue that recommender systems aimed at improving social good should be designed *by* and *with*, not just *for*, the people who will experience their benefits and harms. That is, they should be designed in collaboration with their users, creators, and other stakeholders as full co-designers, not only as user study participants.
Authors:Marco T. Morazán, David Anthony K. Fields, Andrés M. Garced, Tijana MiniÄ
Abstract:
In Formal Languages and Automata Theory courses, students find understanding nondeterministic finite-state and pushdown automata difficult. In many cases, this means that it is challenging for them to comprehend the operational semantics of such machines and, as a consequence, determine why a word is accepted or rejected. This is not entirely surprising, because students are mostly trained to design and implement deterministic programs. Comprehension of pushdown automata is further complicated, because reasoning about the stack is necessary. A common difficulty students face, for example, is understanding that two different computations on the same word may reach the same state with different stack values. To aid student understanding, we present two novel dynamic visualization tools for FSM -- a domain-specific programming language for the Automata Theory classroom -- to support the design of such machines. These tools visualize all computations that may be performed, respectively, by a nondeterministic finite-state machine or by a pushdown automata in a stepwise manner. In addition, these tools aid the machine verification process by allowing users to visually validate whether the properties a state represents hold when a machine transitions into it.
Authors:Marco T. Morazán, Shamil Dzhatdoyev, Josephine Des Rosiers, Tijana MiniÄ, Andrés M. Garced, David Anthony K. Fields
Abstract:
This article presents a novel framework to provide Formal Languages and Automata Theory students design support for the development of regular expressions. This framework includes a design recipe for regular expressions and a customized error messaging system. The error messaging system produces recipe-based errors that include the step of the design recipe not successfully completed. Furthermore, the error messages follow the established practices of being concise, succinct, jargon-free, and nonprescriptive. In addition, a shorthand syntax developed for writing unit tests is described. The in-class use of the design recipe is illustrated, two debugging sessions using the described system are discussed, and the implementation of the error messaging system is briefly sketched.
Authors:Marco T. Morazán, Oliwia Kempinski, Andrés M. Garced
Abstract:
Many Formal Languages and Automata Theory courses introduce students to Turing machine extensions. One of the most widely-used extensions endows Turing machines with multiple tapes. Although multitape Turing machines are an abstraction to simplify Turing machine design, students find them no less challenging. To aid students in understanding these machines, the FSM programming language provides support for their definition and execution. This, however, has proven insufficient for many students to understand the operational semantics of such machines and to understand why such machines accept or reject a word. To address this problem, three visualization tools have been developed. The first is a dynamic visualization tool that simulates machine execution. The second is a static visualization tool that automatically renders a graphic for a multitape Turing machine's transition diagram. The third is a static visualization tool that automatically renders computation graphs for multitape Turing machines. This article presents these tools and illustrates how they are used to help students design and implement multitape Turing machines. In addition, empirical data is presented that suggests these tools are well-received and found useful by students.
Authors:Zhuohao Jerry Zhang, Ruiqi Chen, Mingyuan Zhong, Jacob O. Wobbrock
Abstract:
Automated evaluation of specific graphic designs like presentation slides is an open problem. We present SlideAudit, a dataset for automated slide evaluation. We collaborated with design experts to develop a thorough taxonomy of slide design flaws. Our dataset comprises 2400 slides collected and synthesized from multiple sources, including a subset intentionally modified with specific design problems. We then fully annotated them using our taxonomy through strictly trained crowdsourcing from Prolific. To evaluate whether AI is capable of identifying design flaws, we compared multiple large language models under different prompting strategies, and with an existing design critique pipeline. We show that AI models struggle to accurately identify slide design flaws, with F1 scores ranging from 0.331 to 0.655. Notably, prompting techniques leveraging our taxonomy achieved the highest performance. We further conducted a remediation study to assess AI's potential for improving slides. Among 82.0% of slides that showed significant improvement, 87.8% of them were improved more with our taxonomy, further demonstrating its utility.
Authors:Baoquan Zhao, Xiaofan Ma, Qianshi Pang, Ruomei Wang, Fan Zhou, Shujin Lin
Abstract:
The widespread adoption of digital technology has ushered in a new era of digital transformation across all aspects of our lives. Online learning, social, and work activities, such as distance education, videoconferencing, interviews, and talks, have led to a dramatic increase in speech-rich video content. In contrast to other video types, such as surveillance footage, which typically contain abundant visual cues, speech-rich videos convey most of their meaningful information through the audio channel. This poses challenges for improving content consumption using existing visual-based video summarization, navigation, and exploration systems. In this paper, we present VisAug, a novel interactive system designed to enhance speech-rich video navigation and engagement by automatically generating informative and expressive visual augmentations based on the speech content of videos. Our findings suggest that this system has the potential to significantly enhance the consumption and engagement of information in an increasingly video-driven digital landscape.
Authors:Jingyan Wang, Yang Zhao, Haotian Mao, Xubo Yang
Abstract:
Large Language Models (LLMs) have demonstrated remarkable capabilities in natural language understanding and generation, and their integration with Extended Reality (XR) is poised to transform how users interact with immersive environments. This survey provides a comprehensive review of recent developments at the intersection of LLMs and XR, offering a structured organization of research along both technical and application dimensions. We propose a taxonomy of LLM-enhanced XR systems centered on key technical paradigms -- such as interactive agent control, XR development toolkits, and generative scene synthesis -- and discuss how these paradigms enable novel capabilities in XR. In parallel, we examine how LLM-driven techniques support practical XR applications across diverse domains, including immersive education, clinical healthcare, and industrial manufacturing. By connecting these technical paradigms with application frontiers, our survey highlights current trends, delineates design considerations, and identifies open challenges in building LLM-augmented XR systems. This work provides insights that can guide researchers and practitioners in advancing the state of the art in intelligent XR experiences.
Authors:Kaixuan Wang, Loraine Clarke, Carl-Cyril J Dreue, Guancheng Zhou, Jason T. Jacques
Abstract:
Online communities serve as essential support channels for People Who Use Drugs (PWUD), providing access to peer support and harm reduction information. The moderation of these communities involves consequential decisions affecting member safety, yet existing sociotechnical systems provide insufficient support for moderators. Through interviews with experienced moderators from PWUD forums on Reddit, we analyse the unique nature of this work. We argue that this work constitutes a distinct form of public health intervention characterised by three moderation challenges: the need for specialised, expert risk assessment; time-critical crisis response; and the navigation of a structural conflict between platform policies and community safety goals. We demonstrate how current moderation systems are insufficient in supporting PWUD communities. For example, policies minimising platforms' legal exposure to illicit activities can inadvertently push moderators to implement restrictive rules to protect community's existence, which can limit such a vulnerable group's ability to share potentially life-saving resources online. We conclude by identifying two necessary shifts in sociotechnical design to support moderators' work: first, moving to automated tools that support human sensemaking in contexts with competing interests; and second, shifting from systems that require moderators to perform low-level rule programming to those that enable high-level, example-based instruction. Further, we highlight how the design of sociotechnical systems in online spaces could impact harm reduction efforts aimed at improving health outcomes for PWUD communities.
Authors:Rijul Jain, Shraddha Barke, Gabriel Ebner, Md Rakib Hossain Misu, Shan Lu, Sarah Fakhoury
Abstract:
Proof-oriented programming languages (POPLs) empower developers to write code alongside formal correctness proofs, providing formal guarantees that the code adheres to specified requirements. Despite their powerful capabilities, POPLs present a steep learning curve and have not yet been adopted by the broader software community. The lack of understanding about the proof-development process and how expert proof developers interact with POPLs has hindered the advancement of effective proof engineering and the development of proof-synthesis models/tools.
In this work, we conduct a user study, involving the collection and analysis of fine-grained source code telemetry from eight experts working with two languages, F* and Verus. Results reveal interesting trends and patterns about how experts reason about proofs and key challenges encountered during the proof development process. We identify three distinct strategies and multiple informal practices that are not captured final code snapshots, yet are predictive of task outcomes. We translate these findings into concrete design guidance for AI proof assistants: bias toward early specification drafting, explicit sub-goal decomposition, bounded active errors, and disciplined verifier interaction. We also present a case study of an F* proof agent grounded in these recommendations, and demonstrate improved performance over baseline LLMs
Authors:Raj Mahmud, Shlomo Berkovsky, Mukesh Prasad, A. Baki Kocaballi
Abstract:
Conversational Recommender Systems (CRSs) deliver personalised recommendations through multi-turn natural language dialogue and increasingly support both task-oriented and exploratory interactions. Yet, the factors shaping user interaction preferences remain underexplored. In this within-subjects study (\(N = 139\)), participants experienced two scripted CRS dialogues, rated their experiences, and indicated the importance of eight system qualities. Logistic regression revealed that preference for the exploratory interaction was predicted by enjoyment, usefulness, novelty, and conversational quality. Unexpectedly, perceived effectiveness was also associated with exploratory preference. Clustering uncovered five latent user profiles with distinct dialogue style preferences. Moderation analyses indicated that age, gender, and control preference significantly influenced these choices. These findings integrate affective, cognitive, and trait-level predictors into CRS user modelling and inform autonomy-sensitive, value-adaptive dialogue design. The proposed predictive and adaptive framework applies broadly to conversational AI systems seeking to align dynamically with evolving user needs.
Authors:Lei Han, Mingnan Wei, Qiongyan Chen, Anqi Wang, Rong Pang, Kefei Liu, Rongrong Chen, David Yip
Abstract:
Photo-based reminiscence has the potential to have a positive impact on older adults' reconnection with their personal history and improve their well-being. Supporting reminiscence in older adults through technological implementations is becoming an increasingly important area of research in the fields of HCI and CSCW. However, the impact of integrating gaze and speech as mixed-initiative interactions in LLM-powered reminiscence conversations remains under-explored. To address this, we conducted expert interviews to understand the challenges that older adults face with LLM-powered, photo-based reminiscence experiences. Based on these design considerations, we developed Eye2Recall, a system that integrates eye tracking for detecting visual interest with natural language interaction to create a mixed-initiative reminiscence experience. We evaluated its effectiveness through a user study involving ten older adults. The results have important implications for the future design of more accessible and empowering reminiscence technologies that better align with older adults' natural interaction patterns and enhance their positive aging.
Authors:Raj Mahmud, Yufeng Wu, Abdullah Bin Sawad, Shlomo Berkovsky, Mukesh Prasad, A. Baki Kocaballi
Abstract:
Conversational Recommender Systems (CRSs) are receiving growing research attention across domains, yet their user experience (UX) evaluation remains limited. Existing reviews largely overlook empirical UX studies, particularly in adaptive and large language model (LLM)-based CRSs. To address this gap, we conducted a systematic review following PRISMA guidelines, synthesising 23 empirical studies published between 2017 and 2025. We analysed how UX has been conceptualised, measured, and shaped by domain, adaptivity, and LLM. Our findings reveal persistent limitations: post hoc surveys dominate, turn-level affective UX constructs are rarely assessed, and adaptive behaviours are seldom linked to UX outcomes. LLM-based CRSs introduce further challenges, including epistemic opacity and verbosity, yet evaluations infrequently address these issues. We contribute a structured synthesis of UX metrics, a comparative analysis of adaptive and nonadaptive systems, and a forward-looking agenda for LLM-aware UX evaluation. These findings support the development of more transparent, engaging, and user-centred CRS evaluation practices.
Authors:Haozhe Zhou, Riku Arakawa, Yuvraj Agarwal, Mayank Goel
Abstract:
IMUs are regularly used to sense human motion, recognize activities, and estimate full-body pose. Users are typically required to place sensors in predefined locations that are often dictated by common wearable form factors and the machine learning model's training process. Consequently, despite the increasing number of everyday devices equipped with IMUs, the limited adaptability has seriously constrained the user experience to only using a few well-explored device placements (e.g., wrist and ears). In this paper, we rethink IMU-based motion sensing by acknowledging that signals can be captured from any point on the human body. We introduce IMU over Continuous Coordinates (IMUCoCo), a novel framework that maps signals from a variable number of IMUs placed on the body surface into a unified feature space based on their spatial coordinates. These features can be plugged into downstream models for pose estimation and activity recognition. Our evaluations demonstrate that IMUCoCo supports accurate pose estimation in a wide range of typical and atypical sensor placements. Overall, IMUCoCo supports significantly more flexible use of IMUs for motion sensing than the state-of-the-art, allowing users to place their sensors-laden devices according to their needs and preferences. The framework also supports the ability to change device locations depending on the context and suggests placement depending on the use case.
Authors:Mansi Sharma, Shuang Chen, Philipp Müller, Maurice Rekrut, Antonio Krüger
Abstract:
For machines to effectively assist humans in challenging visual search tasks, they must differentiate whether a human is simply glancing into a scene (navigational intent) or searching for a target object (informational intent). Previous research proposed combining electroencephalography (EEG) and eye-tracking measurements to recognize such search intents implicitly, i.e., without explicit user input. However, the applicability of these approaches to real-world scenarios suffers from two key limitations. First, previous work used fixed search times in the informational intent condition -- a stark contrast to visual search, which naturally terminates when the target is found. Second, methods incorporating EEG measurements addressed prediction scenarios that require ground truth training data from the target user, which is impractical in many use cases. We address these limitations by making the first publicly available EEG and eye-tracking dataset for navigational vs. informational intent recognition, where the user determines search times. We present the first method for cross-user prediction of search intents from EEG and eye-tracking recordings and reach 84.5% accuracy in leave-one-user-out evaluations -- comparable to within-user prediction accuracy (85.5%) but offering much greater flexibility
Authors:Jianhua Li, Shang Gao, Michelle Harvey, Trina Myers
Abstract:
While voucher incentives have been popular for primary and secondary schools, they are less used in higher education. In this study, we leverage industry voucher incentives to inspire students in cybersecurity education (CSE). We adopt a 100% portfolio-based assessment strategy, where students can freely select their target grades in the investigated unit. We purposely design one of the high distinction (HD) tasks to be obtaining an industry certificate and provide vouchers to those who can accomplish a predefined set of tasks before a midpoint. The voucher recipients will use the voucher to access the industry certificate training materials and sit the certificate exam for free. Passing the certificate exam is one of the conditions for gaining an HD grade. Our survey and interviews reveal a substantial influence of voucher incentives on students' career aspirations. In light of the findings, recommendations on adopting voucher incentives in CSE or broader ICT education are offered for institutions and researchers.
Authors:Rukshani Somarathna, Madhawa Perera, Tom Gedeon, Matt Adcock
Abstract:
Explainability remains a critical challenge in artificial intelligence (AI) systems, particularly in high stakes domains such as healthcare, finance, and decision support, where users must understand and trust automated reasoning. Traditional explainability methods such as feature importance and post-hoc justifications often fail to capture the cognitive processes that underlie human decision making, leading to either too technical or insufficiently meaningful explanations. We propose a novel appraisal based framework inspired by the Component Process Model (CPM) for explainability to address this gap. While CPM has traditionally been applied to emotion research, we use its appraisal component as a cognitive model for generating human aligned explanations. By structuring explanations around key appraisal dimensions such as relevance, implications, coping potential, and normative significance our framework provides context sensitive, cognitively meaningful justifications for AI decisions. This work introduces a new paradigm for generating intuitive, human-centred explanations in AI driven systems by bridging cognitive science and explainable AI.
Authors:Douglas Markant, Subham Sah, Alireza Karduni, Milad Rogha, My Thai, Wenwen Dou
Abstract:
Political sectarianism is fueled in part by misperceptions of political opponents: People commonly overestimate the support for extreme policies among members of the other party. Research suggests that correcting partisan misperceptions by informing people about the actual views of outparty members may reduce one's own expressed support for political extremism, including partisan violence and anti-democratic actions. The present study investigated how correction effects depend on different representations of outparty views communicated through data visualizations. We conducted an experiment with U.S. based participants from Prolific (N=239 Democrats, N=244 Republicans). Participants made predictions about support for political violence and undemocratic practices among members of their political outparty. They were then presented with data from an earlier survey on the actual views of outparty members. Some participants viewed only the average response (Mean-Only condition), while other groups were shown visual representations of the range of views from 75% of the outparty (Mean+Interval condition) or the full distribution of responses (Mean+Points condition). Compared to a control group that was not informed about outparty views, we observed the strongest correction effects among participants in the Mean-only and Mean+Points condition, while correction effects were weaker in the Mean+Interval condition. In addition, participants who observed the full distribution of out-party views (Mean+Points condition) were most accurate at later recalling the degree of support among the outparty. Our findings suggest that data visualizations can be an important tool for correcting pervasive distortions in beliefs about other groups. However, the way in which variability in outparty views is visualized can significantly shape how people interpret and respond to corrective information.
Authors:Esen K. Tütüncü, Mar Gonzalez-Franco, Eric J. Gonzalez
Abstract:
We present HandOver, an extended reality (XR) interaction technique designed to unify the precision of traditional mouse input for object selection with the expressiveness of hand-tracking for object manipulation. With HandOver, the mouse is used to drive a depth-aware 3D cursor enabling precise and restful targeting -by hovering their hand over the mouse, the user can then seamlessly transition into direct 3D manipulation of the target object. In a formal user study, we compare HandOver against two raybased techniques: traditional raycasting (Ray) and a hybrid method (Ray+Hand) in a 3D docking task. Results show HandOver yields lower task errors across all distances, and moreover improves interaction ergonomics as highlighted by a RULA posture analysis and self-reported measures (NASA-TLX). These findings illustrate the benefits of blending traditional precise input devices with the expressive gestural inputs afforded by hand-tracking in XR, leading to improved user comfort and task performance. This blended paradigm yields a unified workflow allowing users to leverage the best of each input modality as they interact in immersive environments.
Authors:Guilherme Guerino, Luiz Rodrigues, Luana Bianchini, Mariana Alves, Marcelo Marinho, Thomaz Veloso, Valmir Macario, Diego Dermeval, Thales Vieira, Ig Bittencourt, Seiji Isotani
Abstract:
This study explores the integration of Augmented Intelligence (AuI) in Intelligent Tutoring Systems (ITS) to address challenges in Artificial Intelligence in Education (AIED), including teacher involvement, AI reliability, and resource accessibility. We present MathAIde, an ITS that uses computer vision and AI to correct mathematics exercises from student work photos and provide feedback. The system was designed through a collaborative process involving brainstorming with teachers, high-fidelity prototyping, A/B testing, and a real-world case study. Findings emphasize the importance of a teacher-centered, user-driven approach, where AI suggests remediation alternatives while teachers retain decision-making. Results highlight efficiency, usability, and adoption potential in classroom contexts, particularly in resource-limited environments. The study contributes practical insights into designing ITSs that balanceuser needs and technological feasibility, while advancing AIED research by demonstrating the effectiveness of a mixed-methods, user-centered approach to implementing AuI in educational technologies.
Authors:Haiyun Zhang, Stefano Dalla Gasperina, Saad N. Yousaf, Toshimitsu Tsuboi, Tetsuya Narita, Ashish D. Deshpande
Abstract:
Hand exoskeletons are critical tools for dexterous teleoperation and immersive manipulation interfaces, but achieving accurate hand tracking remains a challenge due to user-specific anatomical variability and donning inconsistencies. These issues lead to kinematic misalignments that degrade tracking performance and limit applicability in precision tasks. We propose a subject-specific calibration framework for exoskeleton-based hand tracking that uses redundant joint sensing and a residual-weighted optimization strategy to estimate virtual link parameters. Implemented on the Maestro exoskeleton, our method improves joint angle and fingertip position estimation across users with varying hand geometries. We introduce a data-driven approach to empirically tune cost function weights using motion capture ground truth, enabling more accurate and consistent calibration across participants. Quantitative results from seven subjects show substantial reductions in joint and fingertip tracking errors compared to uncalibrated and evenly weighted models. Qualitative visualizations using a Unity-based virtual hand further confirm improvements in motion fidelity. The proposed framework generalizes across exoskeleton designs with closed-loop kinematics and minimal sensing, and lays the foundation for high-fidelity teleoperation and learning-from-demonstration applications.
Authors:Marta BieÅkiewicz, Julia Ayache, Panayiotis Charalambous, Cristina Becchio, Marco Corragio, Bertram Taetz, Francesco De Lellis, Antonio Grotta, Anna Server, Daniel Rammer, Richard Kulpa, Franck Multon, Azucena Garcia-Palacios, Jessica Sutherland, Kathleen Bryson, Stéphane Donikian, Didier Stricker, Benoît Bardy
Abstract:
This article explores a critical gap in Mixed Reality (MR) technology: while advances have been made, MR still struggles to authentically replicate human embodiment and socio-motor interaction. For MR to enable truly meaningful social experiences, it needs to incorporate multi-modal data streams and multi-agent interaction capabilities. To address this challenge, we present a comprehensive glossary covering key topics such as Virtual Characters and Autonomisation, Responsible AI, Ethics by Design, and the Scientific Challenges of Social MR within Neuroscience, Embodiment, and Technology. Our aim is to drive the transformative evolution of MR technologies that prioritize human-centric innovation, fostering richer digital connections. We advocate for MR systems that enhance social interaction and collaboration between humans and virtual autonomous agents, ensuring inclusivity, ethical design and psychological safety in the process.
Authors:Dong Hyun Roh, Rajesh Kumar, An Ngo
Abstract:
This paper presents a keystroke-based framework for detecting LLM-assisted cheating in Korean, addressing key gaps in prior research regarding language coverage, cognitive context, and the granularity of LLM involvement. Our proposed dataset includes 69 participants who completed writing tasks under three conditions: Bona fide writing, paraphrasing ChatGPT responses, and transcribing ChatGPT responses. Each task spans six cognitive processes defined in Bloom's Taxonomy (remember, understand, apply, analyze, evaluate, and create). We extract interpretable temporal and rhythmic features and evaluate multiple classifiers under both Cognition-Aware and Cognition-Unaware settings. Temporal features perform well under Cognition-Aware evaluation scenarios, while rhythmic features generalize better under cross-cognition scenarios. Moreover, detecting bona fide and transcribed responses was easier than paraphrased ones for both the proposed models and human evaluators, with the models significantly outperforming the humans. Our findings affirm that keystroke dynamics facilitate reliable detection of LLM-assisted writing across varying cognitive demands and writing strategies, including paraphrasing and transcribing LLM-generated responses.
Authors:Daniel Udekwe, Dimitrios Bolkas, Eren Erman Ozguven, Ren Moses, Qianwen Guo
Abstract:
Surveying is a core component of civil engineering education, requiring students to engage in hands-on spatial measurement, instrumentation handling, and field-based decision-making. However, traditional instruction often poses logistical and cognitive challenges that can hinder accessibility and student engagement. While virtual laboratories have gained traction in engineering education, few are purposefully designed to support flexible, adaptive learning in surveying. To address this gap, we developed Virtual Reality for Immersive and Interactive Surveying Education (VRISE), an immersive virtual reality laboratory that replicates ground-based and aerial surveying tasks through customizable, accessible, and user-friendly modules. VRISE features interactive experiences such as differential leveling with a digital level equipment and waypoint-based drone navigation, enhanced by input smoothing, adaptive interfaces, and real-time feedback to accommodate diverse learning styles. Evaluation across multiple user sessions demonstrated consistent gains in measurement accuracy, task efficiency, and interaction quality, with a clear progression in skill development across the ground-based and aerial surveying modalities. By reducing cognitive load and physical demands, even in tasks requiring fine motor control and spatial reasoning, VRISE demonstrates the potential of immersive, repeatable digital environments to enhance surveying education, broaden participation, and strengthen core competencies in a safe and engaging setting.
Authors:Sami Saeed Alghamdi, Christopher Bull, Ahmed Kharrufa
Abstract:
Many people learn programming independently from online resources and often report struggles in achieving their personal learning goals. Learners frequently describe their experiences as isolating and frustrating, challenged by abundant uncertainties, information overload, and distraction, compounded by limited guidance. At the same time, social media serves as a personal space where many engage in diverse self-regulation practices, including help-seeking, using external memory aids (e.g., self-notes), self-reflection, emotion regulation, and self-motivation. For instance, learners often mark achievements and set milestones through their posts. In response, we developed a system consisting of a web platform and browser extensions to support self-regulation online. The design aims to add learner-defined structure to otherwise unstructured experiences and bring meaning to curation and reflection activities by translating them into learning stories with AI-generated feedback. We position storytelling as an integrative approach to design that connects resource curation, reflective and sensemaking practice, and narrative practices learners already use across social platforms. We recruited 15 informal programming learners who are regular social media users to engage with the system in a self-paced manner; participation concluded upon submitting a learning story and survey. We used three quantitative scales and a qualitative survey to examine users' characteristics and perceptions of the system's support for their self-regulation. User feedback suggests the system's viability as a self-regulation aid. Learners particularly valued in-situ reflection, automated story feedback, and video annotation, while other features received mixed views. We highlight perceived benefits, friction points, and design opportunities for future AI-augmented self-regulation tools.
Authors:Francis Geng, Anshul Shah, Haolin Li, Nawab Mulla, Steven Swanson, Gerald Soosai Raj, Daniel Zingaro, Leo Porter
Abstract:
Background and Context. Chat-based and inline-coding-based GenAI has already had substantial impact on the CS Education community. The recent introduction of ``vibe coding'' may further transform how students program, as it introduces a new way for students to create software projects with minimal oversight.
Objectives. The purpose of this study is to understand how students in introductory programming and advanced software engineering classes interact with a vibe coding platform (Replit) when creating software and how the interactions differ by programming background.
Methods. Interview participants were asked to think-aloud while building a web application using Replit. Thematic analysis was then used to analyze the video recordings with an emphasis on the interactions between the student and Replit.
Findings. For both groups, the majority of student interactions with Replit were to test or debug the prototype and only rarely did students visit code. Prompts by advanced software engineering students were much more likely to include relevant app feature and codebase contexts than those by introductory programming students.
Authors:Nicholas Mehlman, Jean-Christophe Gagnon-Audet, Michael Shvartsman, Kelvin Niu, Alexander H. Miller, Shagun Sodhani
Abstract:
Surface electromyography (sEMG) signals offer a promising avenue for developing innovative human-computer interfaces by providing insights into muscular activity. However, the limited volume of training data and computational constraints during deployment have restricted the investigation of scaling up the model size for solving sEMG tasks. In this paper, we demonstrate that vanilla transformer models can be effectively scaled up on sEMG data and yield improved cross-user performance up to 110M parameters, surpassing the model size regime investigated in other sEMG research (usually <10M parameters). We show that >100M-parameter models can be effectively distilled into models 50x smaller with minimal loss of performance (<1.5% absolute). This results in efficient and expressive models suitable for complex real-time sEMG tasks in real-world environments.
Authors:Kevin Pu, Ting Zhang, Naveen Sendhilnathan, Sebastian Freitag, Raj Sodhi, Tanya Jonker
Abstract:
Wearable AI systems aim to provide timely assistance in daily life, but existing approaches often rely on user initiation or predefined task knowledge, neglecting users' current mental states. We introduce ProMemAssist, a smart glasses system that models a user's working memory (WM) in real-time using multi-modal sensor signals. Grounded in cognitive theories of WM, our system represents perceived information as memory items and episodes with encoding mechanisms, such as displacement and interference. This WM model informs a timing predictor that balances the value of assistance with the cost of interruption. In a user study with 12 participants completing cognitively demanding tasks, ProMemAssist delivered more selective assistance and received higher engagement compared to an LLM baseline system. Qualitative feedback highlights the benefits of WM modeling for nuanced, context-sensitive support, offering design implications for more attentive and user-aware proactive agents.
Authors:Nishani Fernando, Bahareh Nakisa, Adnan Ahmad, Mohammad Naim Rastgoo
Abstract:
Effective human-AI teaming heavily depends on swift trust, particularly in high-stakes scenarios such as emergency response, where timely and accurate decision-making is critical. In these time-sensitive and cognitively demanding settings, adaptive explainability is essential for fostering trust between human operators and AI systems. However, existing explainable AI (XAI) approaches typically offer uniform explanations and rely heavily on explicit feedback mechanisms, which are often impractical in such high-pressure scenarios. To address this gap, we propose a conceptual framework for adaptive XAI that operates non-intrusively by responding to users' real-time cognitive and emotional states through implicit feedback, thereby enhancing swift trust in high-stakes environments. The proposed adaptive explainability trust framework (AXTF) leverages physiological and behavioral signals, such as EEG, ECG, and eye tracking, to infer user states and support explanation adaptation. At its core is a multi-objective, personalized trust estimation model that maps workload, stress, and emotion to dynamic trust estimates. These estimates guide the modulation of explanation features enabling responsive and personalized support that promotes swift trust in human-AI collaboration. This conceptual framework establishes a foundation for developing adaptive, non-intrusive XAI systems tailored to the rigorous demands of high-pressure, time-sensitive environments.
Authors:Ha Na Cho, Kyuha Jung, Daniel Eisenberg, Cheryl A. King, Kai Zheng
Abstract:
This qualitative study explores barriers to utilization of digital mental health Intervention (DMHI) among college students. Data are from a large randomized clinical trial of an intervention, eBridge, that used motivational interviewing for online counseling to connect students with mental health issues to professional services. We applied thematic analysis to analyze the feedback from the student participants regarding their experience of using the DMHI platform. We identified nine key barriers to DMHI adoption and the use of in-person mental health services: emotional distress, time constraints, privacy concerns, resource accessibility, financial challenges, medication stigma, dissatisfaction with communication, content clarity, and treatment-related concerns. Our findings emphasize the need for personalized, culturally sensitive interventions and improved strategies to enhance the access and engagement in mental health support for young adults.
Authors:Xiaotian Su, Naim Zierau, Soomin Kim, April Yi Wang, Thiemo Wambsganss
Abstract:
Social media platforms increasingly employ proactive moderation techniques, such as detecting and curbing toxic and uncivil comments, to prevent the spread of harmful content. Despite these efforts, such approaches are often criticized for creating a climate of censorship and failing to address the underlying causes of uncivil behavior. Our work makes both theoretical and practical contributions by proposing and evaluating two types of emotion monitoring dashboards to users' emotional awareness and mitigate hate speech. In a study involving 211 participants, we evaluate the effects of the two mechanisms on user commenting behavior and emotional experiences. The results reveal that these interventions effectively increase users' awareness of their emotional states and reduce hate speech. However, our findings also indicate potential unintended effects, including increased expression of negative emotions (Angry, Fear, and Sad) when discussing sensitive issues. These insights provide a basis for further research on integrating proactive emotion regulation tools into social media platforms to foster healthier digital interactions.
Authors:Robert Sparrow, Joshua Hatherley
Abstract:
In the much-celebrated book Deep Medicine, Eric Topol argues that the development of artificial intelligence for health care will lead to a dramatic shift in the culture and practice of medicine. In the next several decades, he suggests, AI will become sophisticated enough that many of the everyday tasks of physicians could be delegated to it. Topol is perhaps the most articulate advocate of the benefits of AI in medicine, but he is hardly alone in spruiking its potential to allow physicians to dedicate more of their time and attention to providing empathetic care for their patients in the future. Unfortunately, several factors suggest a radically different picture for the future of health care. Far from facilitating a return to a time of closer doctor-patient relationships, the use of medical AI seems likely to further erode therapeutic relationships and threaten professional and patient satisfaction.
Authors:Rebecca L. Pharmer, Christopher D. Wickens, Lucas Plabst, Benjamin A. Clegg, Leanne M. Hirshfield, Joanna E. Lewis, Jalynn B. Nicoly, Cara A. Spencer, Francisco R. Ortega
Abstract:
Objective: The study examined the effects of varying all three core elements of cognitive load on learning efficiency during a shape assembly task in virtual reality (VR).
Background: Adaptive training systems aim to improve learning efficiency and retention by dynamically adjusting difficulty. However, design choices can impact the cognitive workload imposed on the learner. The present experiments examined how aspects of cognitive load impact training outcomes.
Method: Participants learned step-by-step shape assembly in a VR environment. Cognitive load was manipulated across three dimensions: Intrinsic Load (shape complexity), Extraneous Load (instruction verbosity), and Germane Load (adaptive vs. fixed training). In adaptive training (experiment 1), difficulty increased based on individual performance. In fixed training (experiment 2), difficulty followed a preset schedule from a yoked participant.
Results: Higher Intrinsic Load significantly increased training times and subjective workload but did not affect retention test accuracy. Extraneous Load modestly impacted training time, with little impact on workload or retention. Adaptive training shortened overall training time without increasing workload or impairing retention. No interactions were observed between the three types of load. Conclusion: Both Intrinsic and Extraneous Load increased training time, but adaptive training improved efficiency without harming retention. The lack of interaction between the elements suggests training benefits can be worth seeking within any of the components of cognitive load. Application: These findings support the use of VR adaptive systems in domains such as manufacturing and military service, where efficient assembly skill acquisition is critical. Tailoring difficulty in real-time can optimize efficiency without compromising learning.
Authors:Naoki Okami, Kazuki Miyake, Naohisa Sakamoto, Jorji Nonaka, Takanori Fujiwara
Abstract:
Comparing tensors and identifying their (dis)similar structures is fundamental in understanding the underlying phenomena for complex data. Tensor decomposition methods help analysts extract tensors' essential characteristics and aid in visual analytics for tensors. In contrast to dimensionality reduction (DR) methods designed only for analyzing a matrix (i.e., second-order tensor), existing tensor decomposition methods do not support flexible comparative analysis. To address this analysis limitation, we introduce a new tensor decomposition method, named tensor unified linear comparative analysis (TULCA), by extending its DR counterpart, ULCA, for tensor analysis. TULCA integrates discriminant analysis and contrastive learning schemes for tensor decomposition, enabling flexible comparison of tensors. We also introduce an effective method to visualize a core tensor extracted from TULCA into a set of 2D visualizations. We integrate TULCA's functionalities into a visual analytics interface to support analysts in interpreting and refining the TULCA results. We demonstrate the efficacy of TULCA and the visual analytics interface with computational evaluations and two case studies, including an analysis of log data collected from a supercomputer.
Authors:Johannes Y. Lee, Derek Xiao, Shreyas Kaasyap, Nima R. Hadidi, John L. Zhou, Jacob Cunningham, Rakshith R. Gore, Deniz O. Eren, Jonathan C. Kao
Abstract:
We introduce LowKeyEMG, a real-time human-computer interface that enables efficient text entry using only 7 gesture classes decoded from surface electromyography (sEMG). Prior work has attempted full-alphabet decoding from sEMG, but decoding large character sets remains unreliable, especially for individuals with motor impairments. Instead, LowKeyEMG reduces the English alphabet to 4 gesture keys, with 3 more for space and system interaction, to reliably translate simple one-handed gestures into text, leveraging the recurrent transformer-based language model RWKV for efficient computation. In real-time experiments, participants achieved average one-handed keyboardless typing speeds of 23.3 words per minute with LowKeyEMG, and improved gesture efficiency by 17% (relative to typed phrase length). When typing with only 7 keys, LowKeyEMG can achieve 98.2% top-3 word accuracy, demonstrating that this low-key typing paradigm can maintain practical communication rates. Our results have implications for assistive technologies and any interface where input bandwidth is constrained.
Authors:Shailesh Mishra, Simone Colombo, Pasindu Tennage, Martin Burkhart, Bryan Ford
Abstract:
Social key recovery mechanisms enable users to recover their vaults with the help of trusted contacts, or trustees, avoiding the need for a single point of trust or memorizing complex strings. However, existing mechanisms overlook the memorability demands on users for recovery, such as the need to recall a threshold number of trustees. Therefore, we first formalize the notion of recovery metadata in the context of social key recovery, illustrating the tradeoff between easing the burden of memorizing the metadata and maintaining metadata privacy. We present Apollo, the first framework that addresses this tradeoff by distributing indistinguishable data within a user's social circle, where trustees hold relevant data and non-trustees store random data. Apollo eliminates the need to memorize recovery metadata since a user eventually gathers sufficient data from her social circle for recovery. Due to indistinguishability, Apollo protects metadata privacy by forming an anonymity set that hides the trustees among non-trustees. To make the anonymity set scalable, Apollo proposes a novel multi-layered secret sharing scheme that mitigates the overhead due to the random data distributed among non-trustees. Finally, we provide a prototype implementation of Apollo and report on its performance. Apollo reduces the chances of malicious recovery to between 0.005% and 1.8%, depending on the adversary's ability to compromise. The multi-layered design shows a latency reduction from 1.1x to 740kx compared to a single-layered approach, depending on the number of reconnections.
Authors:Ruben Janssens, Tony Belpaeme
Abstract:
Large language models have given social robots the ability to autonomously engage in open-domain conversations. However, they are still missing a fundamental social skill: making use of the multiple modalities that carry social interactions. While previous work has focused on task-oriented interactions that require referencing the environment or specific phenomena in social interactions such as dialogue breakdowns, we outline the overall needs of a multimodal system for social conversations with robots. We then argue that vision-language models are able to process this wide range of visual information in a sufficiently general manner for autonomous social robots. We describe how to adapt them to this setting, which technical challenges remain, and briefly discuss evaluation practices.
Authors:Thomas Thibault, Léa Mosesso, Camille Adam, Aurélien Tabard, Anaëlle Beignon, Nolwenn Maudet
Abstract:
Designing for sufficiency is one of many approaches that could foster more moderate and sustainable digital practices. Based on the Sustainable Information and Communication Technologies (ICT) and Human-Computer Interaction (HCI) literature, we identify five environmental settings categories. However, our analysis of three mobile OS and nine representative applications shows an overall lack of environmental concerns in settings design, leading us to identify six pervasive anti-patterns. Environmental settings, where they exist, are set on the most intensive option by default. They are not presented as such, are not easily accessible, and offer little explanation of their impact. Instead, they encourage more intensive use. Based on these findings, we create a design workbook that explores design principles for environmental settings: presenting the environmental potential of settings; shifting to environmentally neutral states; previewing effects to encourage moderate use; rethinking defaults; facilitating settings access and; exploring more frugal settings. Building upon this workbook, we discuss how settings can tie individual behaviors to systemic factors.
Authors:Thomas Roca, Anthony Cintron Roman, Jehú Torres Vega, Marcelo Duarte, Pengce Wang, Kevin White, Amit Misra, Juan Lavista Ferres
Abstract:
As AI-powered image generation improves, a key question is how well human beings can differentiate between "real" and AI-generated or modified images. Using data collected from the online game "Real or Not Quiz.", this study investigates how effectively people can distinguish AI-generated images from real ones. Participants viewed a randomized set of real and AI-generated images, aiming to identify their authenticity. Analysis of approximately 287,000 image evaluations by over 12,500 global participants revealed an overall success rate of only 62\%, indicating a modest ability, slightly above chance. Participants were most accurate with human portraits but struggled significantly with natural and urban landscapes. These results highlight the inherent challenge humans face in distinguishing AI-generated visual content, particularly images without obvious artifacts or stylistic cues. This study stresses the need for transparency tools, such as watermarks and robust AI detection tools to mitigate the risks of misinformation arising from AI-generated content
Authors:Chenyang Zhang, Tiffany S Ma, John Andrews, Eric J Gonzalez, Mar Gonzalez-Franco, Yalong Yang
Abstract:
Spatial interaction in 3D environments requires balancing efficiency and precision, which requires dynamic tracking speed adjustments. However, existing techniques often couple tracking speed adjustments directly with hand movements, reducing interaction flexibility. Inspired by the natural friction control inherent in the physical world, we introduce ForcePinch, a novel force-responsive spatial interaction method that enables users to intuitively modulate pointer tracking speed and smoothly transition between rapid and precise movements by varying their pinching force. To implement this concept, we developed a hardware prototype integrating a pressure sensor with a customizable mapping function that translates pinching force into tracking speed adjustments. We conducted a user study with 20 participants performing well-established 1D, 2D, and 3D object manipulation tasks, comparing ForcePinch against the distance-responsive technique Go-Go and speed-responsive technique PRISM. Results highlight distinctive characteristics of the force-responsive approach across different interaction contexts. Drawing on these findings, we highlight the contextual meaning and versatility of force-responsive interactions through four illustrative examples, aiming to inform and inspire future spatial interaction design.
Authors:Alice Williams, Boris Kovalerchuk
Abstract:
The visualization of multi-dimensional data with interpretable methods remains limited by capabilities for both high-dimensional lossless visualizations that do not suffer from occlusion and that are computationally capable by parameterized visualization. This paper proposes a low to high dimensional data supporting framework using lossless Concentric Coordinates that are a more compact generalization of Parallel Coordinates along with former Circular Coordinates. These are forms of the General Line Coordinate visualizations that can directly support machine learning algorithm visualization and facilitate human interaction.
Authors:Lena Cibulski, Stefan Bruckner
Abstract:
Decision-making is a central yet under-defined goal in visualization research. While existing task models address decision processes, they often neglect the conditions framing a decision. To better support decision-making tasks, we propose a characterization scheme that describes decision problems through key properties of the data, users, and task context. This scheme helps visualization researchers specify decision-support claims more precisely and informs the design of appropriate visual encodings and interactions. We demonstrate the utility of our approach by applying it to characterize decision tasks targeted by existing design studies, highlighting opportunities for future research in decision-centric visualization.
Authors:Shizhen Zhang, Shengxin Li, Quan Li
Abstract:
Adults with Attention Deficit Hyperactivity Disorder (ADHD) often experience communication challenges, primarily due to executive dysfunction and emotional dysregulation, even after years of social integration. While existing interventions predominantly target children through structured or intrusive methods, adults lack tools that translate clinical strategies into daily communication support. To address this gap, we present Understood, a Mixed Reality (MR) system implemented on Microsoft HoloLens 2, designed to assist adults with ADHD in real-world communication. Through formative semi-structured interviews and a design workshop, we identified critical communication barriers and derived design goals for the system. Understood combines three key features: (1) real-time conversation summarization to reduce cognitive load, (2) context-aware subsequent word suggestions during moments of disfluency, and (3) topic shifting detection and reminding to mitigate off-topic transitions. A within-subjects user study and expert interviews demonstrate that Understood effectively supports communication with high usability, offering a complement to therapist-mediated interventions.
Authors:Yayan Zhao, Matthew Berger
Abstract:
In this work we study the identification of spatial correlation in distributions of 2D scalar fields, presented across different forms of visual displays. We study simple visual displays that directly show color-mapped scalar fields, namely those drawn from a distribution, and whether humans can identify strongly correlated spatial regions in these displays. In this setting, the recognition of correlation requires making judgments on a set of fields, rather than just one field. Thus, in our experimental design we compare two basic visualization designs: animation-based displays against juxtaposed views of scalar fields, along different choices of color scales. Moreover, we investigate the impacts of the distribution itself, controlling for the level of spatial correlation and discriminability in spatial scales. Our study's results illustrate the impacts of these distribution characteristics, while also highlighting how different visual displays impact the types of judgments made in assessing spatial correlation. Supplemental material is available at https://osf.io/zn4qy
Authors:Fatma Betül GüreÅ, Tanya Nazaretsky, Bahar Radmehr, Martina Rau, Tanja Käser
Abstract:
Supporting students in developing effective diagnostic reasoning is a key challenge in various educational domains. Novices often struggle with cognitive biases such as premature closure and over-reliance on heuristics. Scenario-based learning (SBL) can address these challenges by offering realistic case experiences and iterative practice, but the optimal sequencing of instruction and problem-solving activities remains unclear. This study examines how personalized support can be incorporated into different instructional sequences and whether providing explicit diagnostic strategy instruction before (I-PS) or after problem-solving (PS-I) improves learning and its transfer. We employ a between-groups design in an online SBL environment called PharmaSim, which simulates real-world client interactions for pharmacy technician apprentices. Results indicate that while both instruction types are beneficial, PS-I leads to significantly higher performance in transfer tasks.
Authors:Danny D. Leybzon, Shreyas Tirumala, Nishant Jain, Summer Gillen, Michael Jackson, Cameron McPhee, Jennifer Schmidt
Abstract:
With the rise of voice-enabled artificial intelligence (AI) systems, quantitative survey researchers have access to a new data-collection mode: AI telephone surveying. By using AI to conduct phone interviews, researchers can scale quantitative studies while balancing the dual goals of human-like interactivity and methodological rigor. Unlike earlier efforts that used interactive voice response (IVR) technology to automate these surveys, voice AI enables a more natural and adaptive respondent experience as it is more robust to interruptions, corrections, and other idiosyncrasies of human speech.
We built and tested an AI system to conduct quantitative surveys based on large language models (LLM), automatic speech recognition (ASR), and speech synthesis technologies. The system was specifically designed for quantitative research, and strictly adhered to research best practices like question order randomization, answer order randomization, and exact wording.
To validate the system's effectiveness, we deployed it to conduct two pilot surveys with the SSRS Opinion Panel and followed-up with a separate human-administered survey to assess respondent experiences. We measured three key metrics: the survey completion rates, break-off rates, and respondent satisfaction scores. Our results suggest that shorter instruments and more responsive AI interviewers may contribute to improvements across all three metrics studied.
Authors:Yan Dong, Hanjie Yu, Yanran Chen, Zipeng Zhang, Qiong Wu
Abstract:
Integrating technology with the distinctive characteristics of craftsmanship has become a key issue in the field of digital craftsmanship. This paper introduces Layered Interactions, a design approach that seamlessly merges Human-Computer Interaction (HCI) technologies with traditional lacquerware craftsmanship. By leveraging the multi-layer structure and material properties of lacquerware, we embed interactive circuits and integrate programmable hardware within the layers, creating tangible interfaces that support diverse interactions. This method enhances the adaptability and practicality of traditional crafts in modern digital contexts. Through the development of a lacquerware toolkit, along with user experiments and semi-structured interviews, we demonstrate that this approach not only makes technology more accessible to traditional artisans but also enhances the materiality and emotional qualities of interactive interfaces. Additionally, it fosters mutual learning and collaboration between artisans and technologists. Our research introduces a cross-disciplinary perspective to the HCI community, broadening the material and design possibilities for interactive interfaces.
Authors:Jenny T. Liang, Chenyang Yang, Agnia Sergeyuk, Travis D. Breaux, Brad A. Myers
Abstract:
Prompting foundation models (FMs) like large language models (LLMs) have enabled new AI-powered software features (e.g., text summarization) that previously were only possible by fine-tuning FMs. Now, developers are embedding prompts in software, known as prompt programs. The process of prompt programming requires the developer to make many changes to their prompt. Yet, the questions developers ask to update their prompt is unknown, despite the answers to these questions affecting how developers plan their changes. With the growing number of research and commercial prompt programming tools, it is unclear whether prompt programmers' needs are being adequately addressed. We address these challenges by developing a taxonomy of 25 tasks prompt programmers do and 51 questions they ask, measuring the importance of each task and question. We interview 16 prompt programmers, observe 8 developers make prompt changes, and survey 50 developers. We then compare the taxonomy with 48 research and commercial tools. We find that prompt programming is not well-supported: all tasks are done manually, and 16 of the 51 questions -- including a majority of the most important ones -- remain unanswered. Based on this, we outline important opportunities for prompt programming tools.
Authors:Chase Stokes, Kylie Lin, Cindy Xiong Bearfield
Abstract:
A growing body of work on visualization affordances highlights how specific design choices shape reader takeaways from information visualizations. However, mapping the relationship between design choices and reader conclusions often requires labor-intensive crowdsourced studies, generating large corpora of free-response text for analysis. To address this challenge, we explored alternative scalable research methodologies to assess chart affordances. We test four elicitation methods from human-subject studies: free response, visualization ranking, conclusion ranking, and salience rating, and compare their effectiveness in eliciting reader interpretations of line charts, dot plots, and heatmaps. Overall, we find that while no method fully replicates affordances observed in free-response conclusions, combinations of ranking and rating methods can serve as an effective proxy at a broad scale. The two ranking methodologies were influenced by participant bias towards certain chart types and the comparison of suggested conclusions. Rating conclusion salience could not capture the specific variations between chart types observed in the other methods. To supplement this work, we present a case study with GPT-4o, exploring the use of large language models (LLMs) to elicit human-like chart interpretations. This aligns with recent academic interest in leveraging LLMs as proxies for human participants to improve data collection and analysis efficiency. GPT-4o performed best as a human proxy for the salience rating methodology but suffered from severe constraints in other areas. Overall, the discrepancies in affordances we found between various elicitation methodologies, including GPT-4o, highlight the importance of intentionally selecting and combining methods and evaluating trade-offs.
Authors:Qiong Wu, Yan Dong, Zipeng Zhang, Ruochen Hu
Abstract:
In exhibition hybrid spaces, scale consistency between real and virtual spaces is crucial for user immersion. However, there is currently a lack of systematic research to determine appropriate virtual-to-real mapping ratios. This study developed an immersive interaction system based on Intel 3D Athlete Tracking body mapping technology. Two experiments investigated the impact of virtual space and virtual avatar scale on immersion. Experiment 1 investigated 30 participants' preferences for virtual space scale, while Experiment 2 tested the effect of 6 different virtual avatar sizes (25%-150%) on immersion. A 5-point Likert scale was used to assess immersion, followed by analysis of variance and Tukey HSD post-hoc tests. Experiment 1 showed that participants preferred a virtual space ratio of 130% (mean 127.29%, SD 8.55%). Experiment 2 found that virtual avatar sizes within the 75%-100% range produced optimal immersion (p < 0.05). Immersion decreased significantly when virtual avatar sizes deviated from users' actual height (below 50% or above 125%). Participants were more sensitive to size changes in the 25%-75% range, while perception was weaker for changes in the 75%-100% range. Virtual environments slightly larger than real space (130%) and virtual avatars slightly smaller than users (75%-100%) optimize user immersion. These findings have been applied in the Intel Global Trade Center exhibition hall, demonstrating actionable insights for designing hybrid spaces that enhance immersion and coherence.
Authors:Bo Wen, Chen Wang, Qiwei Han, Raquel Norel, Julia Liu, Thaddeus Stappenbeck, Jeffrey L. Rogers
Abstract:
The integration of voice-based AI agents in healthcare presents a transformative opportunity to bridge economic and accessibility gaps in digital health delivery. This paper explores the role of large language model (LLM)-powered voice assistants in enhancing preventive care and continuous patient monitoring, particularly in underserved populations. Drawing insights from the development and pilot study of Agent PULSE (Patient Understanding and Liaison Support Engine) -- a collaborative initiative between IBM Research, Cleveland Clinic Foundation, and Morehouse School of Medicine -- we present an economic model demonstrating how AI agents can provide cost-effective healthcare services where human intervention is economically unfeasible. Our pilot study with 33 inflammatory bowel disease patients revealed that 70\% expressed acceptance of AI-driven monitoring, with 37\% preferring it over traditional modalities. Technical challenges, including real-time conversational AI processing, integration with healthcare systems, and privacy compliance, are analyzed alongside policy considerations surrounding regulation, bias mitigation, and patient autonomy. Our findings suggest that AI-driven voice agents not only enhance healthcare scalability and efficiency but also improve patient engagement and accessibility. For healthcare executives, our cost-utility analysis demonstrates huge potential savings for routine monitoring tasks, while technologists can leverage our framework to prioritize improvements yielding the highest patient impact. By addressing current limitations and aligning AI development with ethical and regulatory frameworks, voice-based AI agents can serve as a critical entry point for equitable, sustainable digital healthcare solutions.
Authors:Mahika Phutane, Aditya Vashistha
Abstract:
People with disabilities (PwD) experience disproportionately high levels of discrimination and hate online, particularly in India, where entrenched stigma and limited resources intensify these challenges. Large language models (LLMs) are increasingly used to identify and mitigate online hate, yet most research on online ableism focuses on Western audiences with Western AI models. Are these models adequately equipped to recognize ableist harm in non-Western places like India? Do localized, Indic language models perform better? To investigate, we adopted and translated a publicly available ableist speech dataset to Hindi, and prompted eight LLMs--four developed in the U.S. (GPT-4, Gemini, Claude, Llama) and four in India (Krutrim, Nanda, Gajendra, Airavata)--to score and explain ableism. In parallel, we recruited 175 PwD from both the U.S. and India to perform the same task, revealing stark differences between groups. Western LLMs consistently overestimated ableist harm, while Indic LLMs underestimated it. Even more concerning, all LLMs were more tolerant of ableism when it was expressed in Hindi and asserted Western framings of ableist harm. In contrast, Indian PwD interpreted harm through intention, relationality, and resilience--emphasizing a desire to inform and educate perpetrators. This work provides groundwork for global, inclusive standards of ableism, demonstrating the need to center local disability experiences in the design and evaluation of AI systems.
Authors:Onyinye Dibia, Mengyi Lu, Prianka Bhattacharjee, Joseph P. Near, Yuanyuan Feng
Abstract:
The increasing adoption of differential privacy (DP) leads to public-facing DP deployments by both government agencies and companies. However, real-world DP deployments often do not fully disclose their privacy guarantees, which vary greatly between deployments. Failure to disclose certain DP parameters can lead to misunderstandings about the strength of the privacy guarantee, undermining the trust in DP. In this work, we seek to inform future standards for communicating the privacy guarantees of DP deployments. Based on semi-structured interviews with 12 DP experts, we identify important DP parameters necessary to comprehensively communicate DP guarantees, and describe why and how they should be disclosed. Based on expert recommendations, we design an initial privacy label for DP to comprehensively communicate privacy guarantees in a standardized format.
Authors:Gerben van der Hoek, Bastiaan Heeren, Rogier Bos, Paul Drijvers, Johan Jeuring
Abstract:
Computer aided formative assessment can be used to enhance a learning process, for instance by providing feedback. There are many design choices for delivering feedback, that lead to a feedback strategy. In an informative feedback strategy, students do not immediately receive information about the correct response, but are offered the opportunity to retry a task to apply feedback information. In this small-scale qualitative study, we explore an informative feedback strategy designed to offer a balance between room for exploration and mitigation of learning barriers. The research questions concern the ways in which students interact with the feedback strategy and their appreciation of error-specific feedback as opposed to worked-out solutions. To answer these questions, twenty-five 15-to-17-year-old senior general secondary education students worked for approximately 20 minutes on linear and exponential extrapolation tasks in an online environment. Data included screen captures of students working with the environment and post-intervention interviews. Results showed that room for exploration offered opportunities for self-guidance while mitigation of learning barriers prevented disengagement. Furthermore, students appreciated balanced feedback. We conclude that the balanced feedback strategy yielded fruitful student-environment interactions.
Authors:Karim Benharrak, Puyuan Peng, Amy Pavel
Abstract:
Millions of people listen to podcasts, audio stories, and lectures, but editing speech remains tedious and time-consuming. Creators remove unnecessary words, cut tangential discussions, and even re-record speech to make recordings concise and engaging. Prior work automatically summarized speech by removing full sentences (extraction), but rigid extraction limits expressivity. AI tools can summarize then re-synthesize speech (abstraction), but abstraction strips the speaker's style. We present TalkLess, a system that flexibly combines extraction and abstraction to condense speech while preserving its content and style. To edit speech, TalkLess first generates possible transcript edits, selects edits to maximize compression, coverage, and audio quality, then uses a speech editing model to translate transcript edits into audio edits. TalkLess's interface provides creators control over automated edits by separating low-level wording edits (via the compression pane) from major content edits (via the outline pane). TalkLess achieves higher coverage and removes more speech errors than a state-of-the-art extractive approach. A comparison study (N=12) showed that TalkLess significantly decreased cognitive load and editing effort in speech editing. We further demonstrate TalkLess's potential in an exploratory study (N=3) where creators edited their own speech.
Authors:Karim Alghoul, Hussein Al Osman, Abdulmotaleb El Saddik
Abstract:
Human computer interaction has become integral to modern life, driven by advancements in machine learning technologies. Affective computing, in particular, has focused on systems that recognize, interpret, and respond to human emotions, often using wearable devices, which provide continuous data streams of physiological signals. Among various physiological signals, the photoplethysmogram (PPG) has gained prominence due to its ease of acquisition from widely available devices. However, the generalization of PPG-based emotion recognition models across individuals remains an unresolved challenge. This paper introduces a novel hybrid architecture that combines Convolutional Neural Networks (CNNs), Long Short-Term Memory networks (LSTMs), and Temporal Convolutional Networks (TCNs) to address this issue. The proposed model integrates the strengths of these architectures to improve robustness and generalization. Raw PPG signals are fed into the CNN for feature extraction. These features are processed separately by LSTM and TCN. The outputs from these components are concatenated to generate a final feature representation, which serves as the input for classifying valence and arousal, the primary dimensions of emotion. Experiments using the Photoplethysmogram Dataset for Emotional Analysis (PPGE) demonstrate that the proposed hybrid model achieves better model generalization than standalone CNN and LSTM architectures. Our results show that the proposed solution outperforms the state-of-the-art CNN architecture, as well as a CNN-LSTM model, in emotion recognition tasks with PPG signals. Using metrics such as Area Under the Curve (AUC) and F1 Score, we highlight the model's effectiveness in handling subject variability.
Authors:Emmanuel Akinrintoyo, Nicole Salomons
Abstract:
Individual cognitive stimulation therapy (iCST) is a non-pharmacological intervention for improving the cognition and quality of life of persons with dementia (PwDs); however, its effectiveness is limited by low adherence to delivery by their family members. In this work, we present the user-centered design and evaluation of a novel socially assistive robotic system to provide iCST therapy to PwDs in their homes for long-term use. We consulted with 16 dementia caregivers and professionals. Through these consultations, we gathered design guidelines and developed the prototype. The prototype was validated by testing it with three dementia professionals and five PwDs. The evaluation revealed PwDs enjoyed using the system and are willing to adopt its use over the long term. One shortcoming was the system's speech-to-text capabilities, where it frequently failed to understand the PwDs.
Authors:Minghao Cai, Guher Gorgun, Carrie Demmans Epp
Abstract:
Cognitive load is key to ensuring an optimal learning experience. However, measuring the cognitive load of educational tasks typically relies on self-report measures which has been criticized by researchers for being subjective. In this study, we investigated the feasibility of using item difficulty parameters as a proxy for measuring cognitive load in an online learning platform. Difficulty values that were derived using item-response theory were consistent with theories of how intrinsic and extraneous load contribute to cognitive load. This finding suggests that we can use item difficulty to represent intrinsic load when modelling cognitive load in learning games.
Authors:Xumeng Wang, Xiangxuan Zhang, Zhiqi Gao, Shuangcheng Jiao, Yuxin Ma
Abstract:
The boom in visualization generation tools has significantly lowered the threshold for chart authoring. Nevertheless, chart authors with an insufficient understanding of perceptual theories may encounter difficulties in evaluating the effectiveness of chart representations, thereby struggling to identify the appropriate chart design to convey the intended data patterns. To address this issue, we propose a perception simulation model that can assess the perceptual effectiveness of charts by predicting graphical patterns that chart viewers are likely to notice. The perception simulation model integrates perceptual theory into visual feature extraction of chart elements to provide interpretable model outcomes. Human perceptual results proved that the outcome of our model can simulate the perceptual grouping behaviors of most chart viewers and cover diverse perceptual results. We also embed the model into a prototype interface called PatternSight to facilitate chart authors in assessing whether the chart design can satisfy their pattern representation requirements as expected and determining feasible improvements of visual design. According to the results of a user experiment, PatternSight can effectively assist chart authors in optimizing chart design for representing data patterns.
Authors:Ke Er Amy Zhang, Jodie Jenkinson, Laura Garrison
Abstract:
We conduct a deconstructive reading of a qualitative interview study with 17 visual data journalists from newsrooms across the globe. We borrow a deconstruction approach from literary critique to explore the instability of meaning in language and reveal implicit beliefs in words and ideas. Through our analysis we surface two sets of opposing implicit beliefs in visual data journalism: objectivity/subjectivity and humanism/mechanism. We contextualize these beliefs through a genealogical analysis, which brings deconstruction theory into practice by providing a historic backdrop for these opposing perspectives. Our analysis shows that these beliefs held within visual data journalism are not self-enclosed but rather a product of external societal forces and paradigm shifts over time. Through this work, we demonstrate how thinking with critical theories such as deconstruction and genealogy can reframe "success" in visual data storytelling and diversify visualization research outcomes. These efforts push the ways in which we as researchers produce domain knowledge to examine the sociotechnical issues of today's values towards datafication and data visualization. All supplemental materials for this work are available at osf.io/5fr48.
Authors:Liu He, Yuanchao Li, Rui Feng, XinRan Han, Yin-Long Liu, Yuwei Yang, Zude Zhu, Jiahong Yuan
Abstract:
Gender bias has been widely observed in speech perception tasks, influenced by the fundamental voicing differences between genders. This study reveals a gender bias in the perception of Alzheimer's Disease (AD) speech. In a perception experiment involving 16 Chinese listeners evaluating both Chinese and Greek speech, we identified that male speech was more frequently identified as AD, with this bias being particularly pronounced in Chinese speech. Acoustic analysis showed that shimmer values in male speech were significantly associated with AD perception, while speech portion exhibited a significant negative correlation with AD identification. Although language did not have a significant impact on AD perception, our findings underscore the critical role of gender bias in AD speech perception. This work highlights the necessity of addressing gender bias when developing AD detection models and calls for further research to validate model performance across different linguistic contexts.
Authors:Daocheng Lin, Yifan Wang, Yutong Yang, Xingyu Lan
Abstract:
When designed deliberately, data visualizations can become powerful persuasive tools, influencing viewers' opinions, values, and actions. While researchers have begun studying this issue (e.g., to evaluate the effects of persuasive visualization), we argue that a fundamental mechanism of persuasion resides in rhetorical construction, a perspective inadequately addressed in current visualization research. To fill this gap, we present a focused analysis of octopus maps, a visual genre that has maintained persuasive power across centuries and achieved significant social impact. Employing rhetorical schema theory, we collected and analyzed 90 octopus maps spanning from the 19th century to contemporary times. We closely examined how octopus maps implement their persuasive intents and constructed a design space that reveals how visual metaphors are strategically constructed and what common rhetorical strategies are applied to components such as maps, octopus imagery, and text. Through the above analysis, we also uncover a set of interesting findings. For instance, contrary to the common perception that octopus maps are primarily a historical phenomenon, our research shows that they remain a lively design convention in today's digital age. Additionally, while most octopus maps stem from Western discourse that views the octopus as an evil symbol, some designs offer alternative interpretations, highlighting the dynamic nature of rhetoric across different sociocultural settings. Lastly, drawing from the lessons provided by octopus maps, we discuss the associated ethical concerns of persuasive visualization.
Authors:Xingyu Lan, Yutong Yang, Yifan Wang
Abstract:
Affective visualization design is an emerging research direction focused on communicating and influencing emotion through visualization. However, as revealed by previous research, this area is highly interdisciplinary and involves theories and practices from diverse fields and disciplines, thus awaiting analysis from more fine-grained angles. To address this need, this work focuses on a pioneering and relatively mature sub-area, affective geovisualization design, to further the research in this direction and provide more domain-specific insights. Through an analysis of a curated corpus of affective geovisualization designs using the Person-Process-Place (PPP) model from geographic theory, we derived a design taxonomy that characterizes a variety of methods for eliciting and enhancing emotions through geographic visualization. We also identified four underlying high-level design paradigms of affective geovisualization design (e.g., computational, anthropomorphic) that guide distinct approaches to linking geographic information with human experience. By extending existing affective visualization design frameworks with geographic specificity, we provide additional design examples, domain-specific analyses, and insights to guide future research and practices in this underexplored yet highly innovative domain.
Authors:Derrick Lin, Tracie Tran, Shravan Thaploo, Jose Gabrielle E. Matias, Joy E. Pixley, Zoran Nenadic, An H. Do
Abstract:
(Abridged) Stroke and SCI are conditions that can significantly impact the QoL of survivors in both the physical and psychosocial domains. Both diseases often result in significant motor and sensory impairments that are not fully reversible despite current available therapies. Invasive BCIs have emerged as a promising means to bypass the site of injury and potentially restore motor and sensory function. However, to maximize the utility and participant satisfaction with such technology, participants' willingness to embrace BCIs must be assessed, and placed in context with functional goals and rehabilitative priorities. Hence, we conducted a survey of a cohort of stroke (n=33), SCI (n=37), and both (n=1) participants regarding their receptiveness to invasive ECoG-based BCIs as well as to assess their goals for functional rehabilitation. Overall, participants indicated a high level of willingness to undergo surgery to implant ECoG grids for BCI technology if basic motor functions, including upper extremity, gait, bowel/bladder, and sensory function were restored. There was no correlation between participant willingness to undergo a prospective BCI implantation and the level of functional recovery offered by the BCI. Similarly, there was no correlation between willingness to undergo surgery and the participants' perceived rehabilitative priorities and level of disability. These findings indicate that participants were interested in invasive BCI technology even if only basic functions can be restored, regardless of their level of disability and their rehabilitative priorities. Such observations imply that first generation commercial invasive BCIs may not need extensive functions to garner adoption. Conversely, it also raises a concern that participants from the stroke and SCI cohort may be overly enthusiastic about such technology, which poses potential risks for medical exploitation.
Authors:Justin M. Kasowski, Apurv Varshney, Michael Beyeler
Abstract:
Visual neuroprostheses (bionic eye) aim to restore a rudimentary form of vision by translating camera input into patterns of electrical stimulation. To improve scene understanding under extreme resolution and bandwidth constraints, prior work has explored computer vision techniques such as semantic segmentation and depth estimation. However, presenting all task-relevant information simultaneously can overwhelm users in cluttered environments. We compare two complementary approaches to semantic preprocessing in immersive virtual reality: SemanticEdges, which highlights all relevant objects at once, and SemanticRaster, which staggers object categories over time to reduce visual clutter. Using a biologically grounded simulation of prosthetic vision, 18 sighted participants performed a wayfinding task in a dynamic urban environment across three conditions: edge-based baseline (Control), SemanticEdges, and SemanticRaster. Both semantic strategies improved performance and user experience relative to the baseline, with each offering distinct trade-offs: SemanticEdges increased the odds of success, while SemanticRaster boosted the likelihood of collision-free completions. These findings underscore the value of adaptive semantic preprocessing for prosthetic vision and, more broadly, may inform the design of low-bandwidth visual interfaces in XR that must balance information density, task relevance, and perceptual clarity.
Authors:Tatiana Petrova, Boris Bliznioukov, Aleksandr Puzikov, Radu State
Abstract:
The concept of the Web of Agents (WoA), which transforms the static, document-centric Web into an environment of autonomous agents acting on users' behalf, has attracted growing interest as large language models (LLMs) become more capable. However, research in this area is still fragmented across different communities. Contemporary surveys catalog the latest LLM-powered frameworks, while the rich histories of Multi-Agent Systems (MAS) and the Semantic Web are often treated as separate, legacy domains. This fragmentation obscures the intellectual lineage of modern systems and hinders a holistic understanding of the field's trajectory. We present the first comprehensive evolutionary overview of the WoA. We show that modern protocols like A2A and the MCP, are direct evolutionary responses to the well-documented limitations of earlier standards like FIPA standards and OWL-based semantic agents. To systematize this analysis, we introduce a four-axis taxonomy (semantic foundation, communication paradigm, locus of intelligence, discovery mechanism). This framework provides a unified analytical lens for comparing agent architectures across all generations, revealing a clear line of descent where others have seen a disconnect. Our analysis identifies a paradigm shift in the 'locus of intelligence': from being encoded in external data (Semantic Web) or the platform (MAS) to being embedded within the agent's core model (LLM). This shift is foundational to modern Agentic AI, enabling the scalable and adaptive systems the WoA has long envisioned. We conclude that while new protocols are essential, they are insufficient for building a robust, open, trustworthy ecosystem. Finally, we argue that the next research frontier lies in solving persistent socio-technical challenges, and we map out a new agenda focused on decentralized identity, economic models, security, and governance for the emerging WoA.
Authors:Munachimso B. Oguine, Ozioma C. Oguine, Karla Badillo-Urquiola, Oluwasogo Adekunle Okunade
Abstract:
Adolescents increasingly rely on online technologies to explore their identities, form social connections, and access information and entertainment. However, their growing digital engagement exposes them to significant online risks, particularly in underrepresented contexts like West Africa. This study investigates the online experiences of 409 secondary school adolescents in Nigeria's Federal Capital Territory (FCT), focusing on their access to technology, exposure to risks, coping strategies, key stakeholders influencing their online interactions, and recommendations for improving online safety. Using self-administered surveys, we found that while most adolescents reported moderate access to online technology and connectivity, those who encountered risks frequently reported exposure to inappropriate content and online scams. Blocking and reporting tools were the most commonly used strategies, though some adolescents responded with inaction due to limited resources or awareness. Parents emerged as the primary support network, though monitoring practices and communication varied widely. Guided by Protection Motivation Theory (PMT), our analysis interprets adolescents' online safety behaviors as shaped by both their threat perceptions and their confidence in available coping strategies. A thematic analysis of their recommendations highlights the need for greater awareness and education, parental mediation, enhanced safety tools, stricter age restrictions, improved content moderation, government accountability, and resilience-building initiatives. Our findings underscore the importance of culturally and contextually relevant interventions to empower adolescents in navigating the digital world, with implications for parents, educators, designers, and policymakers.
Authors:Arjun Madhusudan, Benjamin Watson
Abstract:
Esports athletes often reduce visual quality to improve latency and frame rate, and increase their in-game performance. Little research has examined the effects of this visuo-spatial tradeoff on performance, but we could find no work studying how players manage this tradeoff in practice. This paper is an initial examination of this question in the game Dota 2. First, we gather the game configuration data of Dota 2 players in a small survey. We learn that players do limit visual detail, particularly by turning off VSYNC, which removes rendering/display synchronization delay but permits visual "tearing". Second, we survey the intent of those same players with a few subjective questions. Player intent matches configuration practice. While our sampling of Dota 2 players may not be representative, our survey does reveal suggestive trends that lay the groundwork for future, more rigorous and larger surveys. Such surveys can help new players adapt to the game more quickly, encourage researchers to investigate the relative importance of temporal and visual detail, and justify design effort by developers in "low visual" game configurations.
Authors:Milena Pustet, Elisabeth Steffen, Helena MihaljeviÄ, Grischa Stanjek, Yannis Illies
Abstract:
The role of civil society organizations (CSOs) in monitoring harmful online content is increasingly crucial, especially as platform providers reduce their investment in content moderation. AI tools can assist in detecting and monitoring harmful content at scale. However, few open-source tools offer seamless integration of AI models and social media monitoring infrastructures. Given their thematic expertise and contextual understanding of harmful content, CSOs should be active partners in co-developing technological tools, providing feedback, helping to improve models, and ensuring alignment with stakeholder needs and values, rather than as passive 'consumers'. However, collaborations between the open source community, academia, and civil society remain rare, and research on harmful content seldom translates into practical tools usable by civil society actors. This work in progress explores how CSOs can be meaningfully involved in an AI-assisted open-source monitoring tool of anti-democratic movements on Telegram, which we are currently developing in collaboration with CSO stakeholders.
Authors:Pranav Pandey, Ramviyas Parasuraman, Prashant Doshi
Abstract:
Ensuring safety in human-robot interaction (HRI) is essential to foster user trust and enable the broader adoption of robotic systems. Traditional safety models primarily rely on sensor-based measures, such as relative distance and velocity, to assess physical safety. However, these models often fail to capture subjective safety perceptions, which are shaped by individual traits and contextual factors. In this paper, we introduce and analyze a parameterized general safety model that bridges the gap between physical and perceived safety by incorporating a personalization parameter, $Ï$, into the safety measurement framework to account for individual differences in safety perception. Through a series of hypothesis-driven human-subject studies in a simulated rescue scenario, we investigate how emotional state, trust, and robot behavior influence perceived safety. Our results show that $Ï$ effectively captures meaningful individual differences, driven by affective responses, trust in task consistency, and clustering into distinct user types. Specifically, our findings confirm that predictable and consistent robot behavior as well as the elicitation of positive emotional states, significantly enhance perceived safety. Moreover, responses cluster into a small number of user types, supporting adaptive personalization based on shared safety models. Notably, participant role significantly shapes safety perception, and repeated exposure reduces perceived safety for participants in the casualty role, emphasizing the impact of physical interaction and experiential change. These findings highlight the importance of adaptive, human-centered safety models that integrate both psychological and behavioral dimensions, offering a pathway toward more trustworthy and effective HRI in safety-critical domains.
Authors:Kyla Ellahiyoun, Emma Jane Pretty, Renan Guarese, Marcel Takac, Haytham Fayek, Fabio Zambetta
Abstract:
This study explores the relationship between musical training, cognitive load (CL), and task accuracy within the virtual reality (VR) exergame Beat Saber across increasing levels of difficulty. Participants (N=32) completed a series of post-task questionnaires after playing the game under three task difficulty levels while having their physiological data measured by an Emotibit. Using regression analyses, we found that task difficulty and gaming experience significantly predicted subjective CL, whereas musical training did not. However, musical training significantly predicted higher task accuracy, along with lower subjective CL, increased gaming experience, and greater physiological arousal. These results suggest that musical training enhances task-specific performance but does not directly reduce subjective CL. Future research should consider alternative methods of grouping musical expertise and the additional predictability of flow and self-efficacy.
Authors:Zackary Rackauckas, Julia Hirschberg
Abstract:
This study investigates how stylized, voiced agents shape user interaction in a multimodal language learning environment. We conducted a mixed-methods evaluation of 54 participants interacting with anime-inspired characters powered by large language models and expressive text-to-speech synthesis. These agents responded in Japanese character language, offering users asynchronous, semi-structured conversation in varying speech styles and emotional tones. We analyzed user engagement patterns, perceived usability, emotional responses, and learning behaviors, with particular attention to how agent stylization influenced interaction across language proficiency levels and cultural backgrounds. Our findings reveal that agent design, especially voice, persona, and linguistic style, substantially affected user experience, motivation, and strategy. This work contributes to the understanding of affective, culturally stylized agents in human-agent interaction and offers guidance for designing more engaging, socially responsive systems.
Authors:Yuto Mandai, Katie Seaborn, Tomoyasu Nakano, Xin Sun, Yijia Wang, Jun Kato
Abstract:
"Kawaii" is the Japanese concept of cute, which carries sociocultural connotations related to social identities and emotional responses. Yet, virtually all work to date has focused on the visual side of kawaii, including in studies of computer agents and social robots. In pursuit of formalizing the new science of kawaii vocalics, we explored what elements of voice relate to kawaii and how they might be manipulated, manually and automatically. We conducted a four-phase study (grand N = 512) with two varieties of computer voices: text-to-speech (TTS) and game character voices. We found kawaii "sweet spots" through manipulation of fundamental and formant frequencies, but only for certain voices and to a certain extent. Findings also suggest a ceiling effect for the kawaii vocalics of certain voices. We offer empirical validation of the preliminary kawaii vocalics model and an elementary method for manipulating kawaii perceptions of computer voice.
Authors:Ben Boudaoud, Josef Spjut, Joohwan Kim, Arjun Madhusudan, Benjamin Watson
Abstract:
Historically, much research and development in human computer interaction has focused on atomic and generalizable tasks, where task completion time indicates productivity. However, the emergence of competitive games and esports reminds us of an alternative perspective on human performance in HCI: mastery of higher-level, holistic practices. Just as a world-renowned artist is rarely evaluated for their individual brush strokes, so skilled competitive gamers rarely succeed solely by completing individual mouse movements or keystrokes as quickly as possible. Instead, they optimize more task-specific skills, adeptly performing challenges deep in the learning curve for their game of choice.
Authors:Thomas Vase Schultz Volden, Oleg Jarma Montoya, Paolo Burelli, Marco Scirea
Abstract:
Video game designers often view confusion as undesirable, yet it is inevitable, as new players must adapt to new interfaces and mechanics in an increasingly varied and innovative game market, which is more popular than ever. Research suggests that confusion can contribute to a positive experience, potentially motivating players to learn. The state of confusion in video games should be further investigated to gain more insight into the learning experience of play and how it affects the player experience. In this article, we design a study to collect learning-related affects for users playing a game prototype that intentionally confuses the player. We assess the gathered affects against a complex learning model, affirming that, in specific instances, the player experience aligns with the learning experiences. Moreover, we identify correlations between these affects and the Player Experience Inventory constructs, particularly concerning flow experiences.
Authors:V. C. Storey, J. Parsons, A. Castellanos, M. Tremblay, R. Lukyanenko, W. Maass, A. Castillo
Abstract:
Machine learning enables the extraction of useful information from large, diverse datasets. However, despite many successful applications, machine learning continues to suffer from performance and transparency issues. These challenges can be partially attributed to the limited use of domain knowledge by machine learning models. This research proposes using the domain knowledge represented in conceptual models to improve the preparation of the data used to train machine learning models. We develop and demonstrate a method, called the Conceptual Modeling for Machine Learning (CMML), which is comprised of guidelines for data preparation in machine learning and based on conceptual modeling constructs and principles. To assess the impact of CMML on machine learning outcomes, we first applied it to two real-world problems to evaluate its impact on model performance. We then solicited an assessment by data scientists on the applicability of the method. These results demonstrate the value of CMML for improving machine learning outcomes.
Authors:Nima Yazdani, Aruj Mahajan, Ali Ansari
Abstract:
This paper introduces Zara, an AI-driven recruitment support system developed by micro1, as a practical case study illustrating how large language models (LLMs) can enhance the candidate experience through personalized, scalable interview support. Traditionally, recruiters have struggled to deliver individualized candidate feedback due to logistical and legal constraints, resulting in widespread candidate dissatisfaction. Leveraging OpenAI's GPT-4o, Zara addresses these limitations by dynamically generating personalized practice interviews, conducting conversational AI-driven assessments, autonomously delivering structured and actionable feedback, and efficiently answering candidate inquiries using a Retrieval-Augmented Generation (RAG) system. To promote transparency, we have open-sourced the approach Zara uses to generate candidate feedback.
Authors:Ebrahim Feghhi, Shreyas Kaasyap, Nima Hadidi, Jonathan C. Kao
Abstract:
Speech neuroprostheses aim to restore communication for people with severe paralysis by decoding speech directly from neural activity. To accelerate algorithmic progress, a recent benchmark released intracranial recordings from a paralyzed participant attempting to speak, along with a baseline decoding algorithm. Prior work on the benchmark showed impressive accuracy gains. However, these gains increased computational costs and were not demonstrated in a real-time decoding setting. Here, we make three contributions that pave the way towards accurate, efficient, and real-time neural speech decoding. First, we incorporate large amounts of time masking during training. On average, over $50\%$ of each trial is masked. Second, we replace the gated recurrent unit (GRU) architecture used in the baseline algorithm with a compact Transformer. The Transformer architecture uses $77\%$ fewer parameters, cuts peak GPU memory usage by $36\%$ relative, and is significantly faster to calibrate relative to the GRU. Third, we design a lightweight variant of an existing test-time adaptation method developed for decoding handwriting from neural activity. Our variant adapts the model using multiple time masked augmentations of a single trial and requires only one gradient step per trial. Together, these contributions reduce word error rate by $19.5\%$ and effectively mitigate performance degradations across held-out days in a real-time decoding setting while substantially lowering computational costs.
Authors:Zahra Ashktorab, Alessandra Buccella, Jason D'Cruz, Zoe Fowler, Andrew Gill, Kei Yan Leung, P. D. Magnus, John Richards, Kush R. Varshney
Abstract:
As chatbots driven by large language models (LLMs) are increasingly deployed in everyday contexts, their ability to recover from errors through effective apologies is critical to maintaining user trust and satisfaction. In a preregistered study with Prolific workers (N=162), we examine user preferences for three types of apologies (rote, explanatory, and empathic) issued in response to three categories of common LLM mistakes (bias, unfounded fabrication, and factual errors). We designed a pairwise experiment in which participants evaluated chatbot responses consisting of an initial error, a subsequent apology, and a resolution. Explanatory apologies were generally preferred, but this varied by context and user. In the bias scenario, empathic apologies were favored for acknowledging emotional impact, while hallucinations, though seen as serious, elicited no clear preference, reflecting user uncertainty. Our findings show the complexity of effective apology in AI systems. We discuss key insights such as personalization and calibration that future systems must navigate to meaningfully repair trust.
Authors:Cornelia Gruber, Helen Alber, Bernd Bischl, Göran Kauermann, Barbara Plank, Matthias AÃenmacher
Abstract:
Access to high-quality labeled data remains a limiting factor in applied supervised learning. While label variation (LV), i.e., differing labels for the same instance, is common, especially in natural language processing, annotation frameworks often still rest on the assumption of a single ground truth. This overlooks human label variation (HLV), the occurrence of plausible differences in annotations, as an informative signal. Similarly, active learning (AL), a popular approach to optimizing the use of limited annotation budgets in training ML models, often relies on at least one of several simplifying assumptions, which rarely hold in practice when acknowledging HLV. In this paper, we examine foundational assumptions about truth and label nature, highlighting the need to decompose observed LV into signal (e.g., HLV) and noise (e.g., annotation error). We survey how the AL and (H)LV communities have addressed -- or neglected -- these distinctions and propose a conceptual framework for incorporating HLV throughout the AL loop, including instance selection, annotator choice, and label representation. We further discuss the integration of large language models (LLM) as annotators. Our work aims to lay a conceptual foundation for HLV-aware active learning, better reflecting the complexities of real-world annotation.
Authors:Paulo Ricardo Knob, Leonardo Scholler, Juliano Rigatti, Soraia Raupp Musse
Abstract:
Conversational agents have made significant progress since ELIZA, expanding their role across various domains, including healthcare, education, and customer service. As these agents become increasingly integrated into daily human interactions, the need for emotional intelligence, particularly empathetic listening, becomes increasingly essential. In this study, we explore how Large Language Models (LLMs) respond when tasked with generating emotionally rich interactions. Starting from a small dataset manually crafted by an expert to reflect empathic behavior, we extended the conversations using two LLMs: ChatGPT and Gemini. We analyzed the emotional progression of the dialogues using both sentiment analysis (via VADER) and expert assessments. While the generated conversations often mirrored the intended emotional structure, human evaluation revealed important differences in the perceived empathy and coherence of the responses. These findings suggest that emotion modeling in dialogues requires not only structural alignment in the expressed emotions but also qualitative depth, highlighting the importance of combining automated and humancentered methods in the development of emotionally competent agents.
Authors:Yuhao Zhang, Jiaxin An, Ben Wang, Yan Zhang, Jiqun Liu
Abstract:
Human-centered explainability has become a critical foundation for the responsible development of interactive information systems, where users must be able to understand, interpret, and scrutinize AI-driven outputs to make informed decisions. This systematic survey of literature aims to characterize recent progress in user studies on explainability in interactive information systems by reviewing how explainability has been conceptualized, designed, and evaluated in practice. Following PRISMA guidelines, eight academic databases were searched, and 100 relevant articles were identified. A structural encoding approach was then utilized to extract and synthesize insights from these articles. The main contributions include 1) five dimensions that researchers have used to conceptualize explainability; 2) a classification scheme of explanation designs; 3) a categorization of explainability measurements into six user-centered dimensions. The review concludes by reflecting on ongoing challenges and providing recommendations for future exploration of related issues. The findings shed light on the theoretical foundations of human-centered explainability, informing the design of interactive information systems that better align with diverse user needs and promoting the development of systems that are transparent, trustworthy, and accountable.
Authors:Luke S. Snyder, Chenglong Wang, Steven M. Drucker
Abstract:
Despite the ubiquity of visualization examples published on the web, retargeting existing custom chart implementations to new datasets remains difficult, time-intensive, and tedious. The adaptation process assumes author familiarity with both the implementation of the example as well as how the new dataset might need to be transformed to fit into the example code. With recent advances in Large Language Models (LLMs), automatic adaptation of code can be achieved from high-level user prompts, reducing the barrier for visualization retargeting. To better understand how LLMs can assist retargeting and its potential limitations, we characterize and evaluate the performance of LLM assistance across multiple datasets and charts of varying complexity, categorizing failures according to type and severity. In our evaluation, we compare two approaches: (1) directly instructing the LLM model to fully generate and adapt code by treating code as text inputs and (2) a more constrained program synthesis pipeline where the LLM guides the code construction process by providing structural information (e.g., visual encodings) based on properties of the example code and data. We find that both approaches struggle when new data has not been appropriately transformed, and discuss important design recommendations for future retargeting systems.
Authors:Haonan Yao, Lingyun Yu, Lijie Yao
Abstract:
We present a systematic review on tasks, interactions, and visualization widgets (refer to tangible entities that are used to accomplish data exploration tasks through specific interactions) in the context of tangible data exploration. Tangible widgets have been shown to reduce cognitive load, enable more natural interactions, and support the completion of complex data exploration tasks. Yet, the field lacks a structured understanding of how task types, interaction methods, and widget designs are coordinated, limiting the ability to identify recurring design patterns and opportunities for innovation. To address this gap, we conduct a systematic review to analyze existing work and characterize the current design of data exploration tasks, interactions, and tangible visualization widgets. We next reflect based on our findings and propose a research agenda to inform the development of a future widget design toolkit for tangible data exploration. Our systematic review and supplemental materials are available at physicalviswidget.github.io and osf.io/vjw5e.
Authors:Leila Tavakoli, Hamed Zamani
Abstract:
Despite growing interest in using large language models (LLMs) to automate annotation, their effectiveness in complex, nuanced, and multi-dimensional labelling tasks remains relatively underexplored. This study focuses on annotation for the search clarification task, leveraging a high-quality, multi-dimensional dataset that includes five distinct fine-grained annotation subtasks. Although LLMs have shown impressive capabilities in general settings, our study reveals that even state-of-the-art models struggle to replicate human-level performance in subjective or fine-grained evaluation tasks. Through a systematic assessment, we demonstrate that LLM predictions are often inconsistent, poorly calibrated, and highly sensitive to prompt variations. To address these limitations, we propose a simple yet effective human-in-the-loop (HITL) workflow that uses confidence thresholds and inter-model disagreement to selectively involve human review. Our findings show that this lightweight intervention significantly improves annotation reliability while reducing human effort by up to 45%, offering a relatively scalable and cost-effective yet accurate path forward for deploying LLMs in real-world evaluation settings.
Authors:Olivia Figueira, Pranathi Chamarthi, Tu Le, Athina Markopoulou
Abstract:
TikTok, the social media platform that is popular among children and adolescents, offers a more restrictive "Under 13 Experience" exclusively for young users in the US, also known as TikTok's "Kids Mode". While prior research has studied various aspects of TikTok's regular mode, including privacy and personalization, TikTok's Kids Mode remains understudied, and there is a lack of transparency regarding its content curation and its safety and privacy protections for children. In this paper, (i) we propose an auditing methodology to comprehensively investigate TikTok's Kids Mode and (ii) we apply it to characterize the platform's content curation and determine the prevalence of child-directed content, based on regulations in the Children's Online Privacy Protection Act (COPPA). We find that 83% of videos observed on the "For You" page in Kids Mode are actually not child-directed, and even inappropriate content was found. The platform also lacks critical features, namely parental controls and accessibility settings. Our findings have important design and regulatory implications, as children may be incentivized to use TikTok's regular mode instead of Kids Mode, where they are known to be exposed to further safety and privacy risks.
Authors:Blade Frisch, Betts Peters, Keith Vertanen
Abstract:
Little research has explored the communication needs of autistic adults. Augmentative and alternative communication (AAC) can support these communication needs, but more guidance is needed on how to design AAC systems to support this population. We conducted an online, asynchronous, text-based focus group with five autistic adults to explore their social communication and community engagement and how AAC might support them. Our analysis found 1) participants' emotional experiences impact the communication methods they use, 2) speaking autistic adults can benefit from AAC use, and 3) autistic shutdown creates dynamic communication needs. We present implications for AAC interface design: supporting communication during shutdown, indicating communication ability, and addressing the fear of using AAC. We provide themes for future autism research: exploring the impact of a late diagnosis, understanding communication needs during shutdown, and researching the social and environmental factors that impact communication. Finally, we provide guidance for future online focus groups.
Authors:Hoang-Son Vo, Karina Kolmogortseva, Ngumimi Karen Iyortsuun, Hong-Duyen Vo, Soo-Hyung Kim
Abstract:
A large amount of valuable academic content is only available in its original language, creating a significant access barrier for the global student community. This is a challenge for translating in several subjects, such as history, culture, and the arts, where current automated subtitle tools fail to convey the appropriate pedagogical tone and specialized meaning. In addition, reading traditional automated subtitles increases cognitive load and leads to a disconnected learning experience. Through a mixed-methods study involving 36 participants, we found that GlobalizeEds dubbed formats significantly reduce cognitive load and offer a more immersive learning experience compared to traditional subtitles. Although learning effectiveness was comparable between high-quality subtitles and dubbed formats, both groups valued GlobalizeEds ability to preserve the speakers voice, which enhanced perceived authenticity. Instructors rated translation accuracy and vocal naturalness, whereas students reported that synchronized, identity-preserving outputs fostered engagement and trust. This work contributes a novel human-centered AI framework for cross-lingual education, demonstrating how multimodal translation systems can balance linguistic fidelity, cultural adaptability, and user control to create more inclusive global learning experiences.
Authors:Mohammad Javad Ranjbar Kalahroodi, Heshaam Faili, Azadeh Shakery
Abstract:
Persian Language, despite being spoken by over 100 million people worldwide, remains severely underrepresented in high-quality speech corpora, particularly for text-to-speech (TTS) synthesis applications. Existing Persian speech datasets are typically smaller than their English counterparts, which creates a key limitation for developing Persian speech technologies. We address this gap by introducing ParsVoice, the largest Persian speech corpus designed specifically for TTS applications. We created an automated pipeline that transforms raw audiobook content into TTS-ready data, incorporating components such as a BERT-based sentence completion detector, a binary search boundary optimization method for precise audio-text alignment, and multi-dimensional quality assessment frameworks tailored to Persian. The pipeline processes 2,000 audiobooks, yielding 3,526 hours of clean speech, which was further filtered into a 1,804-hour high-quality subset suitable for TTS, featuring more than 470 speakers. ParsVoice is the largest high-quality Persian speech dataset, offering speaker diversity and audio quality comparable to major English corpora. The complete dataset has been made publicly available to accelerate the development of Persian speech technologies and to serve as a template for other low-resource languages. The ParsVoice dataset is publicly available at ParsVoice (https://huggingface.co/datasets/MohammadJRanjbar/ParsVoice).
Authors:Matan Solomon, Ofra Amir, Omer Ben-Porat
Abstract:
Reinforcement learning agents are often updated with human feedback, yet such updates can be unreliable: reward misspecification, preference conflicts, or limited data may leave policies unchanged or even worse. Because policies are difficult to interpret directly, users face the challenge of deciding whether an update has truly helped. We propose that assessing model updates -- not just a single model -- is a critical design challenge for intelligent user interfaces. In a controlled study, participants provided feedback to an agent in a gridworld and then compared its original and updated policies. We evaluated four strategies for communicating updates: no demonstration, same-context, random-context, and salient-contrast demonstrations designed to highlight informative differences. Salient-contrast demonstrations significantly improved participants' ability to detect when updates helped or harmed performance, mitigating participants' bias towards assuming that feedback is always beneficial, and supported better trust calibration across contexts.
Authors:Jaeyoon Choi, Mohammad Amin Samadi, Spencer JaQuay, Seehee Park, Nia Nixon
Abstract:
Research on Collaborative Problem Solving (CPS) has traditionally examined how humans rely on one another cognitively and socially to accomplish tasks together. With the rapid advancement of AI and large language models, however, a new question emerge: what happens to team dynamics when one of the "teammates" is not human? In this study, we investigate how the integration of an AI teammate -- a fully autonomous GPT-4 agent with social, cognitive, and affective capabilities -- shapes the socio-cognitive dynamics of CPS. We analyze discourse data collected from human-AI teaming (HAT) experiments conducted on a novel platform specifically designed for HAT research. Using two natural language processing (NLP) methods, specifically Linguistic Inquiry and Word Count (LIWC) and Group Communication Analysis (GCA), we found that AI teammates often assumed the role of dominant cognitive facilitators, guiding, planning, and driving group decision-making. However, they did so in a socially detached manner, frequently pushing agenda in a verbose and repetitive way. By contrast, humans working with AI used more language reflecting social processes, suggesting that they assumed more socially oriented roles. Our study highlights how learning analytics can provide critical insights into the socio-cognitive dynamics of human-AI collaboration.
Authors:Sneha Gathani, Kevin Li, Raghav Thind, Sirui Zeng, Matthew Xu, Peter J. Haas, Cagatay Demiralp, Zhicheng Liu
Abstract:
Various analytical techniques-such as scenario modeling, sensitivity analysis, perturbation-based analysis, counterfactual analysis, and parameter space analysis-are used across domains to explore hypothetical scenarios, examine input-output relationships, and identify pathways to desired results. Although termed differently, these methods share common concepts and methods, suggesting unification under what-if analysis. Yet a unified framework to define motivations, core components, and its distinct types is lacking. To address this gap, we reviewed 141 publications from leading visual analytics and HCI venues (2014-2024). Our analysis (1) outlines the motivations for what-if analysis, (2) introduces Praxa, a structured framework that identifies its fundamental components and characterizes its distinct types, and (3) highlights challenges associated with the application and implementation. Together, our findings establish a standardized vocabulary and structural understanding, enabling more consistent use across domains and communicate with greater conceptual clarity. Finally, we identify open research problems and future directions to advance what-if analysis.
Authors:Marco Siino, Giuseppe Bonomo, Rosario Sorbello, Ilenia Tinnirello
Abstract:
The present study investigates the impact of the Rational Discrete Wavelet Transform (RDWT), used as a plug-in preprocessing step for motor imagery electroencephalographic (EEG) decoding prior to applying deep learning classifiers. A systematic paired evaluation (with/without RDWT) is conducted on four state-of-the-art deep learning architectures: EEGNet, ShallowConvNet, MBEEG\_SENet, and EEGTCNet. This evaluation was carried out across three benchmark datasets: High Gamma, BCI-IV-2a, and BCI-IV-2b. The performance of the RDWT is reported with subject-wise averages using accuracy and Cohen's kappa, complemented by subject-level analyses to identify when RDWT is beneficial. On BCI-IV-2a, RDWT yields clear average gains for EEGTCNet (+4.44 percentage points, pp; kappa +0.059) and MBEEG\_SENet (+2.23 pp; +0.030), with smaller improvements for EEGNet (+2.08 pp; +0.027) and ShallowConvNet (+0.58 pp; +0.008). On BCI-IV-2b, the enhancements observed are modest yet consistent for EEGNet (+0.21 pp; +0.044) and EEGTCNet (+0.28 pp; +0.077). On HGD, average effects are modest to positive, with the most significant gain observed for MBEEG\_SENet (+1.65 pp; +0.022), followed by EEGNet (+0.76 pp; +0.010) and EEGTCNet (+0.54 pp; +0.008). Inspection of the subject material reveals significant enhancements in challenging recordings (e.g., non-stationary sessions), indicating that RDWT can mitigate localized noise and enhance rhythm-specific information. In conclusion, RDWT is shown to be a low-overhead, architecture-aware preprocessing technique that can yield tangible gains in accuracy and agreement for deep model families and challenging subjects.
Authors:Upasana Tiwari, Rupayan Chakraborty, Sunil Kumar Kopparapu
Abstract:
Effectiveness of speech emotion recognition in real-world scenarios is often hindered by noisy environments and variability across datasets. This paper introduces a two-step approach to enhance the robustness and generalization of speech emotion recognition models through improved representation learning. First, our model employs EDRL (Emotion-Disentangled Representation Learning) to extract class-specific discriminative features while preserving shared similarities across emotion categories. Next, MEA (Multiblock Embedding Alignment) refines these representations by projecting them into a joint discriminative latent subspace that maximizes covariance with the original speech input. The learned EDRL-MEA embeddings are subsequently used to train an emotion classifier using clean samples from publicly available datasets, and are evaluated on unseen noisy and cross-corpus speech samples. Improved performance under these challenging conditions demonstrates the effectiveness of the proposed method.
Authors:Kalyan Ramana Gattoz, Prasad S. Onkar
Abstract:
In the early stages of engineering design, it is essential to know how a product behaves, especially how it moves. As designers must keep adjusting the motion until it meets the intended requirements, this process is often repetitive and time-consuming. Although the physics behind these motions is usually based on simple equations, manually working through them can be tedious and inefficient. To ease this burden, some tasks are now handled by computers. One common method involves converting hand-drawn sketches into models using CAD or CAE software. However, this approach can be time- and resource-intensive. Additionally, product sketches are usually best understood only by the designers who created them. Others may struggle to interpret them correctly, relying heavily on intuition and prior experience. Since sketches are static, they fail to show how a product moves, limiting their usefulness. This paper presents a new approach that addresses these issues by digitising the natural act of sketching. It allows designers to create, simulate, and test the motion of mechanical concepts in a more interactive way. An application was developed to evaluate this method, focusing on user satisfaction and mental workload during a design task. The results showed a 77% reduction in cognitive effort compared to traditional methods, with users reporting high satisfaction. Future work will focus on expanding this approach from 2D (planar) to full 3D (spatial) design environments, enabling more complex product concept development.
Authors:Emilio Estevan, María Sierra-Torralba, Eduardo López-Larraz, Luis Montesano
Abstract:
Wearable EEG devices have emerged as a promising alternative to polysomnography (PSG). As affordable and scalable solutions, their widespread adoption results in the collection of massive volumes of unlabeled data that cannot be analyzed by clinicians at scale. Meanwhile, the recent success of deep learning for sleep scoring has relied on large annotated datasets. Self-supervised learning (SSL) offers an opportunity to bridge this gap, leveraging unlabeled signals to address label scarcity and reduce annotation effort. In this paper, we present the first systematic evaluation of SSL for sleep staging using wearable EEG. We investigate a range of well-established SSL methods and evaluate them on two sleep databases acquired with the Ikon Sleep wearable EEG headband: BOAS, a high-quality benchmark containing PSG and wearable EEG recordings with consensus labels, and HOGAR, a large collection of home-based, self-recorded, and unlabeled recordings. Three evaluation scenarios are defined to study label efficiency, representation quality, and cross-dataset generalization. Results show that SSL consistently improves classification performance by up to 10% over supervised baselines, with gains particularly evident when labeled data is scarce. SSL achieves clinical-grade accuracy above 80% leveraging only 5% to 10% of labeled data, while the supervised approach requires twice the labels. Additionally, SSL representations prove robust to variations in population characteristics, recording environments, and signal quality. Our findings demonstrate the potential of SSL to enable label-efficient sleep staging with wearable EEG, reducing reliance on manual annotations and advancing the development of affordable sleep monitoring systems.
Authors:Rebecca Westhäußer, Wolfgang Minker, Sebatian Zepf
Abstract:
Large language models (LLMs) increasingly serve as the central control unit of AI agents, yet current approaches remain limited in their ability to deliver personalized interactions. While Retrieval Augmented Generation enhances LLM capabilities by improving context-awareness, it lacks mechanisms to combine contextual information with user-specific data. Although personalization has been studied in fields such as human-computer interaction or cognitive science, existing perspectives largely remain conceptual, with limited focus on technical implementation. To address these gaps, we build on a unified definition of personalization as a conceptual foundation to derive technical requirements for adaptive, user-centered LLM-based agents. Combined with established agentic AI patterns such as multi-agent collaboration or multi-source retrieval, we present a framework that integrates persistent memory, dynamic coordination, self-validation, and evolving user profiles to enable personalized long-term interactions. We evaluate our approach on three public datasets using metrics such as retrieval accuracy, response correctness, or BertScore. We complement these results with a five-day pilot user study providing initial insights into user feedback on perceived personalization. The study provides early indications that guide future work and highlights the potential of integrating persistent memory and user profiles to improve the adaptivity and perceived personalization of LLM-based agents.
Authors:Dave Murray-Rust, Kars Alfrink, Cristina Zaga
Abstract:
Artificial intelligence has become a part of the provision of governmental services, from making decisions about benefits to issuing fines for parking violations. However, AI systems rarely live up to the promise of neutral optimisation, creating biased or incorrect outputs and reducing the agency of both citizens and civic workers to shape the way decisions are made. Transparency is a principle that can both help subjects understand decisions made about them and shape the processes behind those decisions. However, transparency as practiced around AI systems tends to focus on the production of technical objects that represent algorithmic aspects of decision making. These are often difficult for publics to understand, do not connect to potential for action, and do not give insight into the wider socio-material context of decision making. In this paper, we build on existing approaches that take a human-centric view on AI transparency, combined with a socio-technical systems view, to develop the concept of meaningful transparency for civic AI systems: transparencies that allow publics to engage with AI systems that affect their lives, connecting understanding with potential for action.
Authors:Yi-Chi Liao, João Belo, Hee-Seung Moon, Jürgen Steimle, Anna Maria Feit
Abstract:
Human-in-the-loop optimization identifies optimal interface designs by iteratively observing user performance. However, it often requires numerous iterations due to the lack of prior information. While recent approaches have accelerated this process by leveraging previous optimization data, collecting user data remains costly and often impractical. We present a conceptual framework, Human-in-the-Loop Optimization with Model-Informed Priors (HOMI), which augments human-in-the-loop optimization with a training phase where the optimizer learns adaptation strategies from diverse, synthetic user data generated with predictive models before deployment. To realize HOMI, we introduce Neural Acquisition Function+ (NAF+), a Bayesian optimization method featuring a neural acquisition function trained with reinforcement learning. NAF+ learns optimization strategies from large-scale synthetic data, improving efficiency in real-time optimization with users. We evaluate HOMI and NAF+ with mid-air keyboard optimization, a representative VR input task. Our work presents a new approach for more efficient interface adaptation by bridging in situ and in silico optimization processes.
Authors:Saeideh Bakhshi, Phuong Mai Nguyen, Robert Schiller, Tiantian Xu, Pawan Kodandapani, Andrew Levine, Cayman Simpson, Qifan Wang
Abstract:
Recommendation systems have traditionally relied on short-term engagement signals, such as clicks and likes, to personalize content. However, these signals are often noisy, sparse, and insufficient for capturing long-term user satisfaction and retention. We introduce Retentive Relevance, a novel content-level survey-based feedback measure that directly assesses users' intent to return to the platform for similar content. Unlike other survey measures that focus on immediate satisfaction, Retentive Relevance targets forward-looking behavioral intentions, capturing longer term user intentions and providing a stronger predictor of retention. We validate Retentive Relevance using psychometric methods, establishing its convergent, discriminant, and behavioral validity. Through large-scale offline modeling, we show that Retentive Relevance significantly outperforms both engagement signals and other survey measures in predicting next-day retention, especially for users with limited historical engagement. We develop a production-ready proxy model that integrates Retentive Relevance into the final stage of a multi-stage ranking system on a social media platform. Calibrated score adjustments based on this model yield substantial improvements in engagement, and retention, while reducing exposure to low-quality content, as demonstrated by large-scale A/B experiments. This work provides the first empirically validated framework linking content-level user perceptions to retention outcomes in production systems. We offer a scalable, user-centered solution that advances both platform growth and user experience. Our work has broad implications for responsible AI development.
Authors:Nate Laffan, Ashley Hom, Andrea Nadine Castillo, Elizabeth Gitelman, Rebecca Zhao, Nikita Shenoy, Kaia Rae Schweig, Katherine Isbister
Abstract:
The Slow Space Editor is a 2D tool for creating 3D spaces. It was built as part of a research-through-design project that investigates how Virtual and Mixed Reality (XR) environments might be used for reflection and attention restoration. In this phase, we seek to radically simplify the creation of virtual environments, thereby broadening the potential group of users who could benefit from them. The research described in this paper has three aspects. First, we define the concept of "slow space," situating it alongside existing research in HCI and environmental psychology. Second, we report on a series of interviews with professional designers about how slow spaces are created in the physical world. Third, we share the design of the tool itself, focussing on the benefits of providing a simple method for users to control their environments. We conclude with our findings from a 19-person qualitative study of the tool.
Authors:Abhay Bhandarkar, Gaurav Mishra, Khushi Juchani, Harsh Singhal
Abstract:
This study applies BERTopic, a transformer-based topic modeling technique, to the lmsys-chat-1m dataset, a multilingual conversational corpus built from head-to-head evaluations of large language models (LLMs). Each user prompt is paired with two anonymized LLM responses and a human preference label, used to assess user evaluation of competing model outputs. The main objective is uncovering thematic patterns in these conversations and examining their relation to user preferences, particularly if certain LLMs are consistently preferred within specific topics. A robust preprocessing pipeline was designed for multilingual variation, balancing dialogue turns, and cleaning noisy or redacted data. BERTopic extracted over 29 coherent topics including artificial intelligence, programming, ethics, and cloud infrastructure. We analysed relationships between topics and model preferences to identify trends in model-topic alignment. Visualization techniques included inter-topic distance maps, topic probability distributions, and model-versus-topic matrices. Our findings inform domain-specific fine-tuning and optimization strategies for improving real-world LLM performance and user satisfaction.
Authors:Antonios Stamatogiannakis, Arsham Ghodsinia, Sepehr Etminanrad, Dilney Gonçalves, David Santos
Abstract:
When Artificial Intelligence (AI) is used to replace consumers (e.g., synthetic data), it is often assumed that AI emulates established consumers, and more generally human behaviors. Ten experiments with Large Language Models (LLMs) investigate if this is true in the domain of well-documented biases and heuristics. Across studies we observe four distinct types of deviations from human-like behavior. First, in some cases, LLMs reduce or correct biases observed in humans. Second, in other cases, LLMs amplify these same biases. Third, and perhaps most intriguingly, LLMs sometimes exhibit biases opposite to those found in humans. Fourth, LLMs' responses to the same (or similar) prompts tend to be inconsistent (a) within the same model after a time delay, (b) across models, and (c) among independent research studies. Such inconsistencies can be uncharacteristic of humans and suggest that, at least at one point, LLMs' responses differed from humans. Overall, unhuman-like responses are problematic when LLMs are used to mimic or predict consumer behavior. These findings complement research on synthetic consumer data by showing that sources of bias are not necessarily human-centric. They also contribute to the debate about the tasks for which consumers, and more generally humans, can be replaced by AI.
Authors:Nelaka K. A. R, Peiris M. K., Liyanage R. P. B
Abstract:
Autism Spectrum Disorder significantly influences the communication abilities, learning processes, behavior, and social interactions of individuals. Although early intervention and customized educational strategies are critical to improving outcomes, there is a pivotal gap in understanding and addressing nuanced behavioral patterns and emotional identification in autistic children prior to skill development. This extended research delves into the foundational step of recognizing and mapping these patterns as a prerequisite to improving learning and soft skills. Using a longitudinal approach to monitor emotions and behaviors, this study aims to establish a baseline understanding of the unique needs and challenges faced by autistic students, particularly in the Information Technology domain, where opportunities are markedly limited. Through a detailed analysis of behavioral trends over time, we propose a targeted framework for developing applications and technical aids designed to meet these identified needs. Our research underscores the importance of a sequential and evidence-based intervention approach that prioritizes a deep understanding of each child's behavioral and emotional landscape as the basis for effective skill development. By shifting the focus toward early identification of behavioral patterns, we aim to foster a more inclusive and supportive learning environment that can significantly improve the educational and developmental trajectory of children with ASD.
Authors:Prerana Khatiwada, Alejandro Ciuba, Aditya Nayak, Aakash Gautam, Matthew Louis Mauriello
Abstract:
Social media platforms have transformed global communication and interaction, with TikTok emerging as a critical tool for education, connection, and social impact, including in contexts where infrastructural resources are limited. Amid growing political discussions about banning platforms like TikTok, such actions can create significant ripple effects, particularly impacting marginalized communities. We present a study on Nepal, where a TikTok ban was recently imposed and lifted. As a low-resource country in transition where digital communication is rapidly evolving, TikTok enables a space for community engagement and cultural expression. In this context, we conducted an online survey (N=108) to explore user values, experiences, and strategies for navigating online spaces post-ban. By examining these transitions, we aim to improve our understanding of how digital technologies, policy responses, and cultural dynamics interact globally and their implications for governance and societal norms. Our results indicate that users express skepticism toward platform bans but often passively accept them without active opposition. Findings suggest the importance of institutionalizing collective governance models that encourage public deliberation, nuanced control, and socially resonant policy decisions.
Authors:Carolyn Wang, Avriel Epps, Taylor Ferrari, Ra Ames
Abstract:
The abolitionist community faces challenges from both the carceral state and oppressive technologies which, by empowering the ruling class who have the resources to develop artificial intelligence (AI), serve to entrench societal inequities even more deeply. This paper presents a case study in participatory design with transformative and restorative justice practitioners with the goal of designing an AI system to support their work. By co-designing an evaluation framework for large language models with the practitioners, we hope to push back against the exclusionary status quo of AI and extent AI's potentiality to a historically marginalized community.
Authors:Ruben Ruiz-Mateos Serrano, Joe G Troughton, Nima Mirkhani, Natalia Martinez, Massimo Mariello, Jordan Tsigarides, Simon Williamson, Juan Sapriza, Ioana Susnoschi Luca, Antonio Dominguez-Alfaro, Estelle Cuttaz, Nicole Thompson, Sydney Swedick, Latifah Almulla, Amparo Guemes
Abstract:
Neurotechnologies are transforming how we measure, interpret, and modulate brain-body interactions, integrating real-time sensing, computation, and stimulation to enable precise physiological control. They hold transformative potential across clinical and non-clinical domains, from treating disorders to enhancing cognition and performance. Realizing this potential requires navigating complex, interdisciplinary challenges spanning neuroscience, materials science, device engineering, signal processing, computational modelling, and regulatory and ethical frameworks. This Perspective presents a strategic roadmap for neurotechnology development, created by early-career researchers, highlighting their role at the intersection of disciplines and their capacity to bridge traditional silos. We identify five cross-cutting trade-offs that constrain progress across functionality, scalability, adaptability, and translatability, and illustrate how technical domains influence their resolution. Rather than a domain-specific review, we focus on shared challenges and strategic opportunities that transcend disciplines. We propose a unified framework for collaborative innovation and education, highlight ethical and regulatory priorities, and outline a timeline for overcoming key bottlenecks. By aligning technical development with translational and societal needs, this roadmap aims to accelerate equitable, effective, and future-ready adaptive neurotechnologies, guiding coordinated efforts across the global research and innovation community.
Authors:Kaichun Yang, Jian Chen
Abstract:
We present a quantitative evaluation to understand the effect of zero-shot large-language model (LLMs) and prompting uses on chart reading tasks. We asked LLMs to answer 107 visualization questions to compare inference accuracies between the agentic GPT-5 and multimodal GPT-4V, for difficult image instances, where GPT-4V failed to produce correct answers. Our results show that model architecture dominates the inference accuracy: GPT5 largely improved accuracy, while prompt variants yielded only small effects. Pre-registration of this work is available here: https://osf.io/u78td/?view_only=6b075584311f48e991c39335c840ded3; the Google Drive materials are here:https://drive.google.com/file/d/1ll8WWZDf7cCNcfNWrLViWt8GwDNSvVrp/view.
Authors:Wangda Zhu, Guang Chen, Yumeng Zhu, Lei Cai, Xiangen Hu
Abstract:
Mathematical modelling (MM) is a key competency for solving complex real-world problems, yet many students struggle with abstraction, representation, and iterative reasoning. Artificial intelligence (AI) has been proposed as a support for higher-order thinking, but its role in MM education is still underexplored. This study examines the relationships among students' design thinking (DT), computational thinking (CT), and mathematical modelling self-efficacy (MMSE), and investigates their preferences for different AI roles during the modelling process. Using a randomized controlled trial, we identify significant connections among DT, CT, and MMSE, and reveal distinct patterns in students' preferred AI roles, including AI as a tutor (providing explanations and feedback), AI as a tool (assisting with calculations and representations), AI as a collaborator (suggesting strategies and co-creating models), and AI as a peer (offering encouragement and fostering reflection). Differences across learner profiles highlight how students' dispositions shape their expectations for AI. These findings advance understanding of AI-supported MM and provide design implications for adaptive, learner-centered systems.
Authors:Benjamin Nuernberger, Samuel-Hunter Berndt, Robert Tapella, Laura Mann, Aaron Plave, Sasha Samochina, Victor X. Luo
Abstract:
ProtoSpace is a custom JPL-built platform to help scientists and engineers visualize their CAD models collaboratively in augmented reality (AR) and on the web in 3D. In addition to this main use case, ProtoSpace has been used throughout the entire spacecraft mission lifecycle and beyond: ventilator design and assembly; providing AR-based instructions to astronauts in-training; educating the next generation on the process of spacecraft design; etc. ProtoSpace has been used for a decade by NASA missions-including Mars Perseverance, Europa Clipper, NISAR, SPHEREx, CAL, and Mars Sample Return-to reduce cost and risk by helping engineers and scientists fix problems earlier through reducing miscommunication and helping people understand the spatial context of their spacecraft in the appropriate physical context more quickly. This paper will explore how ProtoSpace came to be, define the system architecture and overview-including HoloLens and 3D web clients, the ProtoSpace server, and the CAD model optimizer-and dive into the use cases, spin-offs, and lessons learned that led to 10 years of success at NASA's Jet Propulsion Laboratory.
Authors:Xinyun Cao, Kexin Phyllis Ju, Chenglin Li, Venkatesh Potluri, Dhruv Jain
Abstract:
As virtual 3D environments become prevalent, equitable access is crucial for blind and low-vision (BLV) users who face challenges with spatial awareness, navigation, and interactions. To address this gap, previous work explored supplementing visual information with auditory and haptic modalities. However, these methods are static and offer limited support for dynamic, in-context adaptation. Recent work in generative AI enables users to query and modify 3D scenes via natural language, introducing a paradigm with increased flexibility and control for accessibility improvements. We present RAVEN, a system that responds to query or modification prompts from BLV users to improve the runtime accessibility of 3D virtual scenes. We evaluated the system with eight BLV people, uncovering key insights into the strengths and shortcomings of generative AI-driven accessibility in virtual 3D environments, pointing to promising results as well as challenges related to system reliability and user trust.
Authors:Yuwei Xiao, Shuai Ma, Antti Oulasvirta, Eunice Jun
Abstract:
In Bayesian analysis, prior elicitation, or the process of explicating one's beliefs to inform statistical modeling, is an essential yet challenging step. Analysts often have beliefs about real-world variables and their relationships. However, existing tools require analysts to translate these beliefs and express them indirectly as probability distributions over model parameters. We present PriorWeaver, an interactive visualization system that facilitates prior elicitation through iterative dataset construction and refinement. Analysts visually express their assumptions about individual variables and their relationships. Under the hood, these assumptions create a dataset used to derive statistical priors. Prior predictive checks then help analysts compare the priors to their assumptions. In a lab study with 17 participants new to Bayesian analysis, we compare PriorWeaver to a baseline incorporating existing techniques. Compared to the baseline, PriorWeaver gave participants greater control, clarity, and confidence, leading to priors that were better aligned with their expectations.
Authors:Morgan McErlean, Cella M. Sum, Sukrit Venkatagiri, Sarah Fox
Abstract:
As panoptical, AI-driven surveillance becomes a norm, everyone is impacted. In a reality where all people fall victim to these technologies, establishing links and solidarity is essential to fighting back. Two groups facing rising and targeted surveillance are workers and individuals impacted by the carceral system. Through preliminary data collection from a worker-surveillance lens, our findings reveal several cases of these surveillance infrastructures intersecting. Continuation of our work will involve collecting cases from a carceral-centered lens. Driven by a community-facing analysis of the overlap in the AI-driven surveillance experienced by workers and individuals impacted by the carceral system, we will facilitate discussions with restorative justice activists around cultivating solidarity and empowerment focused on the interconnected nature of workplace and carceral surveillance technologies.
Authors:Richard Rhodes, Sandra Woolley
Abstract:
This forward-looking paper uses speculative design fiction to explore future museum scenarios where citizen curators design and share immersive virtual reality museums populated with tangible heritage artefacts, intangible virtual elements and interactive experiences. The work also explores takeaway 'asset packs' containing 3D artefact models, curation assets, and interactive experiences, and we envisage a visit to the future museum, where the physical and virtual experiences interplay. Finally, the paper considers the implications of this future museum in terms of resources and the potential impacts on traditional museums.
Authors:Lifei Wang, Natalie Friedman, Chengchao Zhu, Zeshu Zhu, S. Joy Mountford
Abstract:
As large language models (LLMs) become ubiquitous in workplace tools and decision-making processes, ensuring explainability and fostering user trust are critical. Although advancements in LLM engineering continue, human-centered design is still catching up, particularly when it comes to embedding transparency and trust into AI interfaces. This study evaluates user experiences with two distinct AI interfaces - node-tree interfaces and chatbot interfaces - to assess their performance in exploratory, follow-up inquiry, decision-making, and problem-solving tasks. Our design-driven approach introduces a node-tree interface that visually structures AI-generated responses into hierarchically organized, interactive nodes, allowing users to navigate, refine, and follow up on complex information. In a comparative study with n=20 business users, we observed that while the chatbot interface effectively supports linear, step-by-step queries, it is the node-tree interface that enhances brainstorming. Quantitative and qualitative findings indicate that node-tree interfaces not only improve task performance and decision-making support but also promote higher levels of user trust by preserving context. Our findings suggest that adaptive AI interfaces capable of switching between structured visualizations and conversational formats based on task requirements can significantly enhance transparency and user confidence in AI-powered systems. This work contributes actionable insights to the fields of human-robot interaction and AI design, particularly for enterprise applications where trust-building is critical for teams.
Authors:Jinsheng Ba, Sverrir Thorgeirsson, Zhendong Su
Abstract:
Recent advances in Large Language Models (LLMs) have introduced a new paradigm for software development, where source code is generated directly from natural language prompts. While this paradigm significantly boosts development productivity, building complex, real-world software systems remains challenging because natural language offers limited control over the generated code. Inspired by the historical evolution of programming languages toward higher levels of abstraction, we advocate for a high-level abstraction language that gives developers greater control over LLM-assisted code writing. To this end, we propose Code Semantic Zooming, a novel approach based on pseudocode that allows developers to iteratively explore, understand, and refine code across multiple layers of semantic abstraction. We implemented Code Semantic Zooming as a VS Code extension and demonstrated its effectiveness through two real-world case studies.
Authors:Mattia Samory, Diana Pamfile, Andrew To, Shruti Phadke
Abstract:
Online communities rely on a mix of platform policies and community-authored rules to define acceptable behavior and maintain order. However, these rules vary widely across communities, evolve over time, and are enforced inconsistently, posing challenges for transparency, governance, and automation. In this paper, we model the relationship between rules and their enforcement at scale, introducing ModQ, a novel question-answering framework for rule-sensitive content moderation. Unlike prior classification or generation-based approaches, ModQ conditions on the full set of community rules at inference time and identifies which rule best applies to a given comment. We implement two model variants - extractive and multiple-choice QA - and train them on large-scale datasets from Reddit and Lemmy, the latter of which we construct from publicly available moderation logs and rule descriptions. Both models outperform state-of-the-art baselines in identifying moderation-relevant rule violations, while remaining lightweight and interpretable. Notably, ModQ models generalize effectively to unseen communities and rules, supporting low-resource moderation settings and dynamic governance environments.
Authors:Jaemarie Solyst, Chloe Fong, Faisal Nurdin, Rotem Landesman, R. Benjamin Shapiro
Abstract:
As AI increasingly saturates our daily lives, it is crucial that youth develop skills to critically use and assess AI systems and envision better alternatives. We apply theories from culturally responsive computing to design and study a learning experience meant to support Black Muslim teen girls in developing critical literacy with generative AI (GenAI). We investigate fashion design as a culturally-rich, creative domain for youth to apply GenAI and then reflect on GenAI's socio-ethical aspects in relation to their own intersectional identities. Through a case study of a three-day, voluntary informal education program, we show how fashion design with GenAI exposed affordances and limitations of current GenAI tools. As the girls used GenAI to create realistic depictions of their dream fashion collections, they encountered socio-ethical limitations of AI, such as biased models and malfunctioning safety systems that prohibited their generation of outputs that reflected their creative ideas, bodies, and cultures. Discussions anchored in the phenomenology of impossible creative realization supported participants' development of critical AI literacy and descriptions of how preferable, identity-affirming technologies would behave. Our findings contribute to the field's growing understanding of how computing education experience designs linking creativity and identity can support critical AI literacy development.
Authors:Ziv Ben-Zion, Zohar Elyoseph, Tobias Spiller, Teddy Lazebnik
Abstract:
Large language models (LLMs) are rapidly evolving from text generators to autonomous agents, raising urgent questions about their reliability in real-world contexts. Stress and anxiety are well known to bias human decision-making, particularly in consumer choices. Here, we tested whether LLM agents exhibit analogous vulnerabilities. Three advanced models (ChatGPT-5, Gemini 2.5, Claude 3.5-Sonnet) performed a grocery shopping task under budget constraints (24, 54, 108 USD), before and after exposure to anxiety-inducing traumatic narratives. Across 2,250 runs, traumatic prompts consistently reduced the nutritional quality of shopping baskets (Change in Basket Health Scores of -0.081 to -0.126; all pFDR<0.001; Cohens d=-1.07 to -2.05), robust across models and budgets. These results show that psychological context can systematically alter not only what LLMs generate but also the actions they perform. By reproducing human-like emotional biases in consumer behavior, LLM agents reveal a new class of vulnerabilities with implications for digital health, consumer safety, and ethical AI deployment.
Authors:Aju Ani Justus, Chris Baber
Abstract:
A critical challenge in modelling Heterogeneous-Agent Teams is training agents to collaborate with teammates whose policies are inaccessible or non-stationary, such as humans. Traditional approaches rely on expensive human-in-the-loop data, which limits scalability. We propose using Large Language Models (LLMs) as policy-agnostic human proxies to generate synthetic data that mimics human decision-making. To evaluate this, we conduct three experiments in a grid-world capture game inspired by Stag Hunt, a game theory paradigm that balances risk and reward. In Experiment 1, we compare decisions from 30 human participants and 2 expert judges with outputs from LLaMA 3.1 and Mixtral 8x22B models. LLMs, prompted with game-state observations and reward structures, align more closely with experts than participants, demonstrating consistency in applying underlying decision criteria. Experiment 2 modifies prompts to induce risk-sensitive strategies (e.g. "be risk averse"). LLM outputs mirror human participants' variability, shifting between risk-averse and risk-seeking behaviours. Finally, Experiment 3 tests LLMs in a dynamic grid-world where the LLM agents generate movement actions. LLMs produce trajectories resembling human participants' paths. While LLMs cannot yet fully replicate human adaptability, their prompt-guided diversity offers a scalable foundation for simulating policy-agnostic teammates.
Authors:Luke Stevenson, Sanchari Das
Abstract:
Mobile healthcare (mHealth) applications promise convenient, continuous patient-provider interaction but also introduce severe and often underexamined security and privacy risks. We present an end-to-end audit of 272 Android mHealth apps from Google Play, combining permission forensics, static vulnerability analysis, and user review mining. Our multi-tool assessment with MobSF, RiskInDroid, and OWASP Mobile Audit revealed systemic weaknesses: 26.1% request fine-grained location without disclosure, 18.3% initiate calls silently, and 73 send SMS without notice. Nearly half (49.3%) still use deprecated SHA-1 encryption, 42 transmit unencrypted data, and 6 remain vulnerable to StrandHogg 2.0. Analysis of 2.56 million user reviews found 28.5% negative or neutral sentiment, with over 553,000 explicitly citing privacy intrusions, data misuse, or operational instability. These findings demonstrate the urgent need for enforceable permission transparency, automated pre-market security vetting, and systematic adoption of secure-by-design practices to protect Protected Health Information (PHI).
Authors:Carolina Carreira, Anu Aggarwal, Alejandro Cuevas, Maria José Ferreira, Hanan Hibshi, Cleotilde Gonzalez
Abstract:
Understanding how cognitive biases influence adversarial decision-making is essential for developing effective cyber defenses. Capture-the-Flag (CTF) competitions provide an ecologically valid testbed to study attacker behavior at scale, simulating real-world intrusion scenarios under pressure. We analyze over 500,000 submission logs from picoCTF, a large educational CTF platform, to identify behavioral signatures of cognitive biases with defensive implications. Focusing on availability bias and the sunk cost fallacy, we employ a mixed-methods approach combining qualitative coding, descriptive statistics, and generalized linear modeling. Our findings show that participants often submitted flags with correct content but incorrect formatting (availability bias), and persisted in attempting challenges despite repeated failures and declining success probabilities (sunk cost fallacy). These patterns reveal that biases naturally shape attacker behavior in adversarial contexts. Building on these insights, we outline a framework for bias-informed adaptive defenses that anticipate, rather than simply react to, adversarial actions.
Authors:Prashanth Arun, Vinita Vader, Erya Xu, Brent McCready-Branch, Sarah Seabrook, Kyle Scholz, Ana Crisan, Igor Grossmann, Pascal Poupart
Abstract:
AI-assisted learning has seen a remarkable uptick over the last few years, mainly due to the rise in popularity of Large Language Models (LLMs). Their ability to hold long-form, natural language interactions with users makes them excellent resources for exploring school- and university-level topics in a dynamic, active manner. We compare students' experiences when interacting with an LLM companion in two capacities: tutored learning and learning-by-teaching. We do this using Chrysalis, an LLM-based system that we have designed to support both AI tutors and AI teachable agents for any topic. Through a within-subject exploratory study with 36 participants, we present insights into student preferences between the two strategies and how constructs such as intellectual humility vary between these two interaction modes. To our knowledge, we are the first to conduct a direct comparison study on the effects of using an LLM as a tutor versus as a teachable agent on multiple topics. We hope that our work opens up new avenues for future research in this area.
Authors:Uroš Šmajdek, Ciril Bohak
Abstract:
We present an interactive visualization system for exploring named entities and their relationships across document collections. The system is designed around a graph-based representation that integrates three types of nodes: documents, entity mentions, and entities. Connections capture two key relationship types: (i) identical entities across contexts, and (ii) co-locations of mentions within documents. Multiple coordinated views enable users to examine entity occurrences, discover clusters of related mentions, and explore higher-level entity group relationships. To support flexible and iterative exploration, the interface offers fuzzy views with approximate connections, as well as tools for interactively editing the graph by adding or removing links, entities, and mentions, as well as editing entity terms. Additional interaction features include filtering, mini-map navigation, and export options to JSON or image formats for downstream analysis and reporting. This approach contributes to human-centered exploration of entity-rich text data by combining graph visualization, interactive refinement, and adaptable perspectives on relationships.
Authors:Luo Cheng, Song Siyang, Yan Siyuan, Yu Zhen, Ge Zongyuan
Abstract:
The automatic generation of diverse and human-like facial reactions in dyadic dialogue remains a critical challenge for human-computer interaction systems. Existing methods fail to model the stochasticity and dynamics inherent in real human reactions. To address this, we propose ReactDiff, a novel temporal diffusion framework for generating diverse facial reactions that are appropriate for responding to any given dialogue context. Our key insight is that plausible human reactions demonstrate smoothness, and coherence over time, and conform to constraints imposed by human facial anatomy. To achieve this, ReactDiff incorporates two vital priors (spatio-temporal facial kinematics) into the diffusion process: i) temporal facial behavioral kinematics and ii) facial action unit dependencies. These two constraints guide the model toward realistic human reaction manifolds, avoiding visually unrealistic jitters, unstable transitions, unnatural expressions, and other artifacts. Extensive experiments on the REACT2024 dataset demonstrate that our approach not only achieves state-of-the-art reaction quality but also excels in diversity and reaction appropriateness.
Authors:Pawel Weichbroth, Maciej Lotysz, Michal Wrobel
Abstract:
The time pressure associated with software development, among other factors, often leads to a diminished emotional state among developers. However, whether emotions affect perceived productivity remains an open question. This study aims to determine the strength and direction of the relationship between emotional state and perceived productivity among software developers. We employed a two-stage approach. First, a survey was conducted with a pool of nine experts to validate the measurement model. Second, a survey was administered to a pool of 88 software developers to empirically test the formulated hypothesis by using Partial Least Squares, as the data analysis method. The results of the path analysis clearly confirm the formulated hypothesis, showing that the emotional state of a software developer has a strong positive, and significant impact (beta = 0.893, p < 0.001) on perceived productivity among software developers. The findings highlight the importance of managing and improving developers emotional well-being to enhance productivity in software development environments. Additionally, interventions aimed at reducing burnout, stress, and other negative factors could have a considerable impact on their performance outcomes.
Authors:Chaojia Yu, Kaixin Wang, Junle Li, Jingjie Wang
Abstract:
Urban intersections with mixed pedestrian and non-motorized vehicle traffic present complex safety challenges, yet traditional models fail to account for dynamic interactions arising from speed heterogeneity and collision anticipation. This study introduces the Time and Angle Based Social Force Model (TASFM), an enhanced framework extending the classical Social Force Model by integrating Time-to-Collision (TTC) metrics and velocity-angle-dependent tangential forces to simulate collision avoidance behaviors more realistically. Using aerial trajectory data from a high-density intersection in Shenzhen, China, we validated TASFM against real-world scenarios, achieving a Mean Trajectory Error (MTE) of 0.154 m (0.77% of the experimental area width). Key findings reveal distinct behavioral patterns: pedestrians self-organize into lanes along designated routes (e.g., zebra crossings), while non-motorized vehicles exhibit flexible path deviations that heighten collision risks. Simulations of three conflict types (overtaking, frontal/lateral crossing) demonstrate TASFM's capacity to replicate adaptive strategies like bidirectional path adjustments and speed modulation. The model provides actionable insights for urban planners, including conflict hotspot prediction and infrastructure redesign (e.g., segregated lanes), while offering a scalable framework for future research integrating motorized traffic and environmental variables. This work advances the understanding of mixed traffic dynamics and bridges the gap between theoretical modeling and data-driven urban safety solutions.
Authors:Rikuo Sasaki, Michimasa Inaba
Abstract:
Recent advancements in AI have highlighted its application in captology, the field of using computers as persuasive technologies. We hypothesized that the "conformity effect," where individuals align with others' actions, also occurs with AI agents. This study verifies this hypothesis by introducing a "Persuadee Agent" that is persuaded alongside a human participant in a three-party persuasive dialogue with a Persuader Agent. We conducted a text-based dialogue experiment with human participants. We compared four conditions manipulating the Persuadee Agent's behavior (persuasion acceptance vs. non-acceptance) and the presence of an icebreaker session. Results showed that when the Persuadee Agent accepted persuasion, both perceived persuasiveness and actual attitude change significantly improved. Attitude change was greatest when an icebreaker was also used, whereas an unpersuaded AI agent suppressed attitude change. Additionally, it was confirmed that the persuasion acceptance of participants increased at the moment the Persuadee Agent was persuaded. These results suggest that appropriately designing a Persuadee Agent can improve persuasion through the conformity effect.
Authors:Jonathan K. Doyon, Sujin Kim, Alex D. Hwang, Jae-Hyun Jung
Abstract:
Homonymous hemianopia (HH) patients report difficulties in avoiding collisions with other pedestrians. We evaluated pedestrian collision detection and avoidance behaviors in HH patients and healthy controls using a novel virtual reality (VR) walking with pedestrians, which enables natural walking behavior in an empty real-world corridor while viewing an immersive VR environment (shopping mall with colliding and other pedestrians) presented in a head-mounted display (HMD). Critically, it measures avoidance maneuvers in addition to collision detection. Colliding and non-colliding pedestrian scenarios were developed for Meta Quest 2 using Unity. Ten normal vision (NV) subjects and 12 HH subjects detected and avoided collisions with virtual approaching and overtaken pedestrians initialized at bearing angles of 20, 40, and 60 degrees, with planned time-to-collision of 6 seconds in each trial. HH subjects were less likely to detect and more likely to collide with pedestrians than NV, particularly for blind-side targets. Response times did not differ between groups but were faster for overtaken pedestrians. HH subjects also biased their head rotations toward the blind side and more after detection compared to before. Collision avoidance difficulties as reported by HH subjects, which clinical measures fail to capture, were recorded and analyzed with objective measures. These metrics may offer further insights into the underlying mechanisms driving collision avoidance behaviors. Our HMD-VR collision detection and avoidance paradigm enables natural walking behaviors and offers an affordable, objective assessment tool that may be adopted by clinicians for mobility enhancement and rehabilitation.
Authors:Songmei Yu, Andrew Zagula
Abstract:
Collaborative group projects are integral to computer science education, as they foster teamwork, problem-solving skills, and industry-relevant competencies. However, assessing individual contributions within group settings has long been a challenge. Traditional assessment strategies, such as the equal distribution of grades or subjective peer assessments, often fall short in terms of fairness, objectivity, and scalability, particularly in large classrooms. This paper introduces a semi-automated, AI-assisted grading system that evaluates both project quality and individual effort using repository mining, communication analytics, and machine learning models. The system comprises modules for project evaluation, contribution analysis, and grade computation, integrating seamlessly with platforms like GitHub. A pilot deployment in a senior-level course demonstrated high alignment with instructor assessments, increased student satisfaction, and reduced instructor grading effort. We conclude by discussing implementation considerations, ethical implications, and proposed enhancements to broaden applicability.
Authors:Arushi Dashore, Aryan Anumala, Emily Hui, Olivia Yang
Abstract:
Automated tennis stroke analysis has advanced significantly with the integration of biomechanical motion cues alongside deep learning techniques, enhancing stroke classification accuracy and player performance evaluation. Despite these advancements, existing systems often fail to connect biomechanical insights with actionable language feedback that is both accessible and meaningful to players and coaches. This research project addresses this gap by developing a novel framework that extracts key biomechanical features (such as joint angles, limb velocities, and kinetic chain patterns) from motion data using Convolutional Neural Network Long Short-Term Memory (CNN-LSTM)-based models. These features are analyzed for relationships influencing stroke effectiveness and injury risk, forming the basis for feedback generation using large language models (LLMs). Leveraging the THETIS dataset and feature extraction techniques, our approach aims to produce feedback that is technically accurate, biomechanically grounded, and actionable for end-users. The experimental setup evaluates this framework on classification performance and interpretability, bridging the gap between explainable AI and sports biomechanics.
Authors:Said Elnaffar, Farzad Rashidi, Abedallah Zaid Abualkishik
Abstract:
This review examines the role of artificial intelligence (AI) agents in programming education, focusing on how these tools are being integrated into educational practice and their impact on student learning outcomes. An analysis of fifty-eight peer-reviewed studies published between 2022 and 2025 identified three primary categories of AI agents: chatbots, generative AI (GenAI), and intelligent tutoring systems (ITS), with GenAI being the most frequently studied. The primary instructional objectives reported include enhanced programming support in 94.83% of studies, motivational and emotional benefits in 18.96%, and increased efficiency for educators in 6.90%. Reported benefits include personalized feedback, improved learning outcomes, and time savings. The review also highlights challenges, such as setup barriers documented in 93.10% of studies, overreliance resulting in superficial learning in 65.52%, and concerns regarding AI errors and academic integrity. These findings suggest the need for instructional frameworks that prioritize the development of prompt engineering skills and human oversight to address these issues. This review provides educators and curriculum designers with an evidence-based foundation for the practical and ethical integration of AI in programming education.
Authors:Vincent Nguyen, Guilherme Herzog, José Cambronero, Marcus Revaj, Aditya Kini, Alexander Frömmgen, Maxim Tabachnyk
Abstract:
Manually editing pasted code is a long-standing developer pain point. In internal software development at Google, we observe that code is pasted 4 times more often than it is manually typed. These paste actions frequently require follow-up edits, ranging from simple reformatting and renaming to more complex style adjustments and cross-language translations. Prior work has shown deep learning can be used to predict these edits. In this work, we show how to iteratively develop and scale Smart Paste, an IDE feature for post-paste edit suggestions, to Google's development environment. This experience can serve as a guide for AI practitioners on a holistic approach to feature development, covering user experience, system integration, and model capabilities. Since deployment, Smart Paste has had overwhelmingly positive feedback with a 45% acceptance rate. At Google's enterprise scale, these accepted suggestions account substantially for over 1% of all code written company-wide.
Authors:Griffin Pitts, Anurata Prabha Hridi, Arun-Balajiee Lekshmi-Narayanan
Abstract:
Novice programmers benefit from timely, personalized support that addresses individual learning gaps, yet the availability of instructors and teaching assistants is inherently limited. Large language models (LLMs) present opportunities to scale such support, though their effectiveness depends on how well technical capabilities are aligned with pedagogical goals. This survey synthesizes recent work on LLM applications in programming education across three focal areas: formative code feedback, assessment, and knowledge modeling. We identify recurring design patterns in how these tools are applied and find that interventions are most effective when educator expertise complements model output through human-in-the-loop oversight, scaffolding, and evaluation. Fully automated approaches are often constrained in capturing the pedagogical nuances of programming education, although human-in-the-loop designs and course specific adaptation offer promising directions for future improvement. Future research should focus on improving transparency, strengthening alignment with pedagogy, and developing systems that flexibly adapt to the needs of varied learning contexts.
Authors:Alice Yang, Michael Beasley, Catherine Taylor, Hu Guo
Abstract:
Augmented and mixed reality (MR) systems have the potential to improve surgical precision by overlaying digital guidance directly onto the operative field. This paper presents a novel MR guidance system using the Magic Leap head-mounted display to assist surgeons in executing precise scalpel movements during liver surgery. The system projects holographic cues onto a patient-specific 3D-printed liver phantom, guiding resection along a predetermined path. We describe the system design, including preoperative modeling, registration of virtual content to the phantom, and real-time visualization through the Magic Leap device. In a controlled phantom study, surgical trainees performed resection tasks with and without MR guidance. Quantitative results demonstrated that MR guidance improved cutting accuracy (mean deviation from planned path was reduced from 5.0 mm without AR to 2.0 mm with AR guidance) and efficiency (mean task time decreased from 55 s to 32 s). These improvements of approximately 60% in accuracy and 40% in speed underscore the potential benefit of MR in surgical navigation. Participants reported that the Magic Leap visualization enhanced depth perception and confidence in locating tumor boundaries. This work provides a comprehensive evaluation of an MR-assisted surgical guidance approach, highlighting its feasibility on a realistic organ phantom. We discuss the technical challenges (registration accuracy, line-of-sight, user ergonomics) and outline future steps toward clinical translation. The results suggest that Magic Leap-based MR guidance can significantly augment a surgeon's performance in delicate resection tasks, paving the way for safer and more precise liver surgery.
Authors:Alex Smith, Priya Patel, Hu Guo, Marco Ruiz
Abstract:
First-time patients undergoing diagnostic computed tomography (CT) scans often experience significant anxiety and uncertainty, which can negatively impact scan results and patient well-being. We present an immersive mixed reality (MR) simulator designed to prepare adult patients for their first CT scan, aiming to improve both emotional and physical preparedness. In this paper, we review existing methods for reducing scan-related anxiety -- from educational materials to virtual reality exposure -- and identify their limitations. We then detail the design and technical implementation of our MR simulator, which combines a virtual CT suite walkthrough, guided relaxation training, realistic scan simulation (including audiovisual cues and breath-hold practice), and interactive feedback. The inclusion of these features is grounded in evidence-based rationale drawn from prior studies in patient anxiety reduction and compliance. We report results from a pilot study ($n=50$) demonstrating that patients who used the simulator had significantly lower pre-scan anxiety levels and improved compliance during the actual CT procedure, compared to controls. Patient feedback was overwhelmingly positive, indicating high satisfaction and perceived utility. We discuss the clinical implications of deploying such a tool, challenges in integration, and future directions for improving patient-centered care using mixed reality technologies.
Authors:Caoilte Ó Ciardha, John Buckley, Rebecca S. Portnoff
Abstract:
The development of generative artificial intelligence (AI) tools capable of producing wholly or partially synthetic child sexual abuse material (AI CSAM) presents profound challenges for child protection, law enforcement, and societal responses to child exploitation. While some argue that the harmfulness of AI CSAM differs fundamentally from other CSAM due to a perceived absence of direct victimization, this perspective fails to account for the range of risks associated with its production and consumption. AI has been implicated in the creation of synthetic CSAM of children who have not previously been abused, the revictimization of known survivors of abuse, the facilitation of grooming, coercion and sexual extortion, and the normalization of child sexual exploitation. Additionally, AI CSAM may serve as a new or enhanced pathway into offending by lowering barriers to engagement, desensitizing users to progressively extreme content, and undermining protective factors for individuals with a sexual interest in children. This paper provides a primer on some key technologies, critically examines the harms associated with AI CSAM, and cautions against claims that it may function as a harm reduction tool, emphasizing how some appeals to harmlessness obscure its real risks and may contribute to inertia in ecosystem responses.
Authors:Xiaoyun Yin, Elmira Zahmat Doost, Shiwen Zhou, Garima Arya Yadav, Jamie C. Gorman
Abstract:
When researchers claim AI systems possess ToM or mental models, they are fundamentally discussing behavioral predictions and bias corrections rather than genuine mental states. This position paper argues that the current discourse conflates sophisticated pattern matching with authentic cognition, missing a crucial distinction between simulation and experience. While recent studies show LLMs achieving human-level performance on ToM laboratory tasks, these results are based only on behavioral mimicry. More importantly, the entire testing paradigm may be flawed in applying individual human cognitive tests to AI systems, but assessing human cognition directly in the moment of human-AI interaction. I suggest shifting focus toward mutual ToM frameworks that acknowledge the simultaneous contributions of human cognition and AI algorithms, emphasizing the interaction dynamics, instead of testing AI in isolation.
Authors:Hasan Mahmud, Najmul Islam, Satish Krishnan
Abstract:
Robo-advisors (RAs) are cost-effective, bias-resistant alternatives to human financial advisors, yet adoption remains limited. While prior research has examined user interactions with RAs, less is known about how individuals interpret RA roles and integrate their advice into decision-making. To address this gap, this study employs a multiphase mixed methods design integrating a behavioral experiment (N = 334), thematic analysis, and follow-up quantitative testing. Findings suggest that people tend to rely on RAs, with reliance shaped by information about RA performance and the framing of advice as gains or losses. Thematic analysis reveals three RA roles in decision-making and four user types, each reflecting distinct patterns of advice integration. In addition, a 2 x 2 typology categorizes antecedents of acceptance into enablers and inhibitors at both the individual and algorithmic levels. By combining behavioral, interpretive, and confirmatory evidence, this study advances understanding of human-RA collaboration and provides actionable insights for designing more trustworthy and adaptive RA systems.
Authors:Sahil Bhandary Karnoor, Romit Roy Choudhury
Abstract:
Pose estimation refers to tracking a human's full body posture, including their head, torso, arms, and legs. The problem is challenging in practical settings where the number of body sensors are limited. Past work has shown promising results using conditional diffusion models, where the pose prediction is conditioned on both measurements from the sensors. Unfortunately, nearly all these approaches generalize poorly across users, primarly because location measurements are highly influenced by the body size of the user. In this paper, we formulate pose estimation as an inverse problem and design an algorithm capable of zero-shot generalization. Our idea utilizes a pre-trained diffusion model and conditions it on rotational measurements alone; the priors from this model are then guided by a likelihood term, derived from the measured locations. Thus, given any user, our proposed InPose method generatively estimates the highly likely sequence of poses that best explains the sparse on-body measurements.
Authors:Joshua C. Yang, Noemi Scheurer
Abstract:
Public funding processes demand fairness, learning, and outcomes that participants can understand. We introduce Komitee Equal Shares, a priceable virtual-budget allocation framework that integrates two signals: in voter mode, participants cast point votes; in evaluator mode, small groups assess proposals against collectively defined impact fields. The framework extends the Method of Equal Shares by translating both signals into virtual spending power and producing voting receipts. We deployed the framework in the 2025 Kultur Komitee in Winterthur, Switzerland. Our contributions are: (1) a clear separation of decision modes, addressing a gap in social choice that typically treats participatory budgeting as preference aggregation while citizens also see themselves as evaluators; and (2) the design of voting receipts that operationalise priceability into participant-facing explanations, making proportional allocations legible and traceable. The framework generalises to participatory grant-making and budgeting, offering a model where citizens act as voters and evaluators within one proportional, explainable allocation.
Authors:Md Talha Mohsin, Ismail Abdulrashid
Abstract:
Healthcare generates diverse streams of data, including electronic health records (EHR), medical imaging, genetics, and ongoing monitoring from wearable devices. Traditional diagnostic models frequently analyze these sources in isolation, which constrains their capacity to identify cross-modal correlations essential for early disease diagnosis. Our research presents a multimodal foundation model that consolidates diverse patient data through an attention-based transformer framework. At first, dedicated encoders put each modality into a shared latent space. Then, they combine them using multi-head attention and residual normalization. The architecture is made for pretraining on many tasks, which makes it easy to adapt to new diseases and datasets with little extra work. We provide an experimental strategy that uses benchmark datasets in oncology, cardiology, and neurology, with the goal of testing early detection tasks. The framework includes data governance and model management tools in addition to technological performance to improve transparency, reliability, and clinical interpretability. The suggested method works toward a single foundation model for precision diagnostics, which could improve the accuracy of predictions and help doctors make decisions.
Authors:Samantha Stedtler, Marianna Leventi
Abstract:
This paper argues that conventional blame practices fall short of capturing the complexity of moral experiences, neglecting power dynamics and discriminatory social practices. It is evident that robots, embodying roles linked to specific social groups, pose a risk of reinforcing stereotypes of how these groups behave or should behave, so they set a normative and descriptive standard. In addition, we argue that faulty robots might create expectations of who is supposed to compensate and repair after their errors, where social groups that are already disadvantaged might be blamed disproportionately if they do not act according to their ascribed roles. This theoretical and empirical gap becomes even more urgent to address as there have been indications of potential carryover effects from Human-Robot Interactions (HRI) to Human-Human Interactions (HHI). We therefore urge roboticists and designers to stay in an ongoing conversation about how social traits are conceptualised and implemented in this technology. We also argue that one solution could be to 'embrace the glitch' and to focus on constructively disrupting practices instead of prioritizing efficiency and smoothness of interaction above everything else. Apart from considering ethical aspects in the design phase of social robots, we see our analysis as a call for more research on the consequences of robot stereotyping and blame attribution.
Authors:Hu Guo, Lily Patel, Rohan Gupt
Abstract:
Optical see-through augmented reality (OST-AR) overlays digital targets and annotations on the physical world, offering promising guidance for hands-on tasks such as medical needle insertion or assembly. Recent work on OST-AR depth perception shows that target opacity and tool visualization significantly affect accuracy and usability; opaque targets and rendering the real instrument reduce depth errors, whereas transparent targets and absent tools impair performance. However, reliance on visual overlays may overload attention and leaves little room for depth cues when occlusion or lighting hampers perception. To address these limitations, we explore multimodal feedback that combines OST-AR with wrist-based vibrotactile haptics. The past two years have seen rapid advances in haptic technology. Researchers have investigated skin-stretch and vibrotactile cues for conveying spatial information to blind users, wearable ring actuators that support precise pinching in AR, cross-modal audio-haptic cursors that enable eyes-free object selection, and wrist-worn feedback for teleoperated surgery that improves force awareness at the cost of longer task times. Studies comparing pull versus push vibrotactile metaphors found that pull cues yield faster gesture completion and lower cognitive load. These findings motivate revisiting OST-AR guidance with a fresh perspective on wrist-based haptics. We design a custom wristband with six vibromotors delivering directional and state cues, integrate it with a handheld tool and OST-AR, and assess its impact on cue recognition and depth guidance. Through a formative study and two experiments (N=21 and N=27), we show that participants accurately identify haptic patterns under cognitive load and that multimodal feedback improves spatial precision and usability compared with visual-only or haptic-only conditions.
Authors:Maura E Halstead, Mark A. Green, Caroline Jay, Richard Kingston, David Topping, Alexander Singleton
Abstract:
Current approaches to data discovery match keywords between metadata and queries. This matching requires researchers to know the exact wording that other researchers previously used, creating a challenging process that could lead to missing relevant data. Large Language Models (LLMs) could enhance data discovery by removing this requirement and allowing researchers to ask questions with natural language. However, we do not currently know if researchers would accept LLMs for data discovery. Using a human-centered artificial intelligence (HCAI) focus, we ran focus groups (N = 27) to understand researchers' perspectives towards LLMs for data discovery. Our conceptual model shows that the potential benefits are not enough for researchers to use LLMs instead of current technology. Barriers prevent researchers from fully accepting LLMs, but features around transparency could overcome them. Using our model will allow developers to incorporate features that result in an increased acceptance of LLMs for data discovery.
Authors:Yunlang Dai, Emma Lurie, Danaé Metaxa, Sorelle A. Friedler
Abstract:
Large language models' (LLMs') outputs are shaped by opaque and frequently-changing company content moderation policies and practices. LLM moderation often takes the form of refusal; models' refusal to produce text about certain topics both reflects company policy and subtly shapes public discourse. We introduce AI Watchman, a longitudinal auditing system to publicly measure and track LLM refusals over time, to provide transparency into an important and black-box aspect of LLMs. Using a dataset of over 400 social issues, we audit Open AI's moderation endpoint, GPT-4.1, and GPT-5, and DeepSeek (both in English and Chinese). We find evidence that changes in company policies, even those not publicly announced, can be detected by AI Watchman, and identify company- and model-specific differences in content moderation. We also qualitatively analyze and categorize different forms of refusal. This work contributes evidence for the value of longitudinal auditing of LLMs, and AI Watchman, one system for doing so.
Authors:Aadarsh Rajiv, Klaus Mueller
Abstract:
Modern legislative frameworks, such as the Affordable Care Act (ACA), often involve complex webs of agencies, mandates, and interdependencies. Government issued charts attempt to depict these structures but are typically static, dense, and difficult to interpret - even for experts. We introduce LegiScout, an interactive visualization system that transforms static policy diagrams into dynamic, force-directed graphs, enhancing comprehension while preserving essential relationships. By integrating data extraction, natural language processing, and computer vision techniques, LegiScout supports deeper exploration of not only the ACA but also a wide range of legislative and regulatory frameworks. Our approach enables stakeholders - policymakers, analysts, and the public - to navigate and understand the complexity inherent in modern law.
Authors:Juan Barrientos, Michaelle Pérez, Douglas González, Favio Reyna, Julio Fajardo, Andrea Lara
Abstract:
Access to obstetric ultrasound is often limited in low-resource settings, particularly in rural areas of low- and middle-income countries. This work proposes a human-in-the-loop artificial intelligence (AI) system designed to assist midwives in acquiring diagnostically relevant fetal images using blind sweep protocols. The system incorporates a classification model along with a web-based platform for asynchronous specialist reviews. By identifying key frames in blind sweep studies, the AI system allows specialists to concentrate on interpretation rather than having to review entire videos. To evaluate its performance, blind sweep videos captured by a small group of soft-trained midwives using a low-cost Point-of-Care Ultrasound (POCUS) device were analyzed. The system demonstrated promising results in identifying standard fetal planes from sweeps made by non-experts. A field evaluation indicated good usability and a low cognitive workload, suggesting that it has the potential to expand access to prenatal imaging in underserved regions.
Authors:Vasileios Maltezos, Roman Kyrychenko, Aleksi Knuutila
Abstract:
The integration of artificial intelligence into journalistic practices represents a transformative shift in how news is gathered, analyzed, and disseminated. Large language models (LLMs), particularly those with agentic capabilities, offer unprecedented opportunities for enhancing journalistic workflows while simultaneously presenting complex challenges for newsroom integration. This research explores how agentic LLMs can support journalists' workflows, based on insights from journalist interviews and from the development of an LLM-based automation tool performing information filtering, summarization, and reporting. The paper details automated aggregation and summarization systems for journalists, presents a technical overview and evaluation of a user-centric LLM-driven reporting system (TeleFlash), and discusses both addressed and unmet journalist needs, with an outlook on future directions for AI-driven tools in journalism.
Authors:Isabel Pedersen, Andrea Slane
Abstract:
The paper asserts that emulating empathy in human-robot interaction is a key component to achieve satisfying social, trustworthy, and ethical robot interaction with older people. Following comments from older adult study participants, the paper identifies a gap. Despite the acceptance of robot care scenarios, participants expressed the poor quality of the social aspect. Current human-robot designs, to a certain extent, neglect to include empathy as a theorized design pathway. Using rhetorical theory, this paper defines the socio-cultural expectations for convincing empathetic relationships. It analyzes and then summarizes how society understands, values, and negotiates empathic interaction between human companions in discursive exchanges, wherein empathy acts as a societal value system. Using two public research collections on robots, with one geared specifically to gerontechnology for older people, it substantiates the lack of attention to empathy in public materials produced by robot companies. This paper contends that using an empathetic care vocabulary as a design pathway is a productive underlying foundation for designing humanoid social robots that aim to support older people's goals of aging-in-place. It argues that the integration of affective AI into the sociotechnical assemblages of human-socially assistive robot interaction ought to be scrutinized to ensure it is based on genuine cultural values involving empathetic qualities.
Authors:Paul-Otto Müller, Sven Suppelt, Mario Kupnik, Oskar von Stryk
Abstract:
Precise tracking of the jaw kinematics is crucial for diagnosing various musculoskeletal and neuromuscular diseases affecting the masticatory system and for advancing rehabilitative devices such as jaw exoskeletons, a hardly explored research field, to treat these disorders. We introduce an open-source, low-cost, precise, non-invasive, and biocompatible jaw tracking system based on optical motion capture technology to address the need for accessible and adaptable research tools. The system encompasses a complete pipeline from data acquisition, processing, and kinematic analysis to filtering, visualization, and data storage. We evaluated its performance and feasibility in experiments with four participants executing various jaw movements. The system demonstrated reliable kinematic tracking with an estimated precision of $(182 \pm 47) μm$ and $(0.126 \pm 0.034) °$. Therefore, the open-source nature of the system and its utility comparable to commercial systems make it suitable for many research and development contexts, especially for applications such as the integration and design of jaw exoskeletons and customized diagnostic protocols. The complete system is available at GitHub with the aim of promoting innovation in temporomandibular disorders research and jaw assistive technology.
Authors:Gianluca De Ninno, Paola Inverardi, Francesca Belotti
Abstract:
This study investigates a novel approach to eliciting users' moral decision-making by combining immersive roleplaying games with LLM analysis capabilities. Building on the distinction introduced by Floridi between hard ethics inspiring and shaping laws-and soft ethics-moral preferences guiding individual behavior within the free space of decisions compliant to laws-we focus on capturing the latter through contextrich, narrative-driven interactions. Grounded in anthropological methods, the role-playing game exposes participants to ethically charged scenarios in the domain of digital privacy. Data collected during the sessions were interpreted by a customized LLM ("GPT Anthropologist"). Evaluation through a cross-validation process shows that both the richness of the data and the interpretive framing significantly enhance the model's ability to predict user behavior. Results show that LLMs can be effectively employed to automate and enhance the understanding of user moral preferences and decision-making process in the early stages of software development.
Authors:Xinhui Ye, Joep Frens, Jun Hu
Abstract:
Exploration is crucial in the design process and is known for its essential role in fostering creativity and enhancing design outcomes. Within design teams, exploration evolves into co-exploration, a collaborative and dynamic practice that this study aims to unpack. To investigate this experience, we conducted a longitudinal observational study with 61 students across 16 design teams. Over five months of weekly diary-interviews, we uncovered the intricate dynamics of co-exploration. Our main contribution is a four-dimensional framework that identifies five distinct patterns of co-exploration activities. Our findings reveal how co-exploration emerges across various activities throughout the design process, demonstrating its role in different team interactions. It fosters a sense of togetherness, keeping design teams open-minded and engaged. This engagement cultivates collective intelligence, enabling teams to actively share knowledge, build upon each other's ideas, and achieve outcomes beyond individual contributions. Our study underscores the value of co-exploration, suggesting that it reflects the trajectory of design success and warrants further research. We also provide actionable insights, equipping future practitioners with strategies to enhance co-exploration in design collaborations.
Authors:Aakash Gautam, Chandani Shrestha, Deborah Tatar, Steve Harrison
Abstract:
We report on an initial ethnographic exploration of the situation of sex-trafficking survivors in Nepal. In the course of studying trafficking survivors in a protected-living situation created by a non-governmental organization in Nepal, we adapted photo-elicitation to hear the voices of the survivors by making the technique more communal. Bringing sociality to the forefront of the method reduced the pressure on survivors to assert voices as individuals, allowing them to speak. We make three contributions to research. First, we propose a communal form of photo-elicitation as a method to elicit values in sensitive settings. Second, we present the complex circumstances of the survivors as they undergo rehabilitation and move towards life with a ``new normal''. Third, our work adds to HCI and CSCW literature on understanding specific concerns of trafficking survivors and aims to inform designs that can support reintegration of survivors in society. The values that the survivors hold and their notion of future opportunities suggest possession of limited but important social capital in some domains that could be leveraged to aid reintegration.
Authors:Michael J. McGuffin, Jean-Marc Robert
Abstract:
Existing graphical user interfaces for circuit simulators often show small visual summaries of the reduced state of each qubit, showing the probability, phase, purity, and/or Bloch sphere coordinates associated with each qubit. These necessarily provide an incomplete picture of the quantum state of the qubits, and can sometimes be confusing for students or newcomers to quantum computing. We contribute two novel visual approaches to provide more complete information about small circuits. First, to complement information about each qubit, we show the complete state vector, and illustrate the way that amplitudes change from layer-to-layer under the effect of different gates, by using a small set of colors, arrows, and symbols. We call this ``state vector difference highlighting'', and show how it elucidates the effect of Hadamard, X, Y, Z, S, T, Phase, and SWAP gates, where each gate may have an arbitrary combination of control and anticontrol qubits. Second, we display pairwise information about qubits (such as concurrence and correlation) in a triangular ``half-matrix'' visualization. Our open source software implementation, called MuqcsCraft, is available as a live online demonstration that runs in a web browser without installing any additional software, allowing a user to define a circuit through drag-and-drop actions, and then simulate and visualize it.
Authors:Ismail Alihan Hadimlioglu, Siddharth Linga
Abstract:
This paper presents Face2Feel, a novel user interface (UI) model that dynamically adapts to user emotions and preferences captured through computer vision. This adaptive UI framework addresses the limitations of traditional static interfaces by integrating digital image processing, face recognition, and emotion detection techniques. Face2Feel analyzes user expressions utilizing a webcam or pre-installed camera as the primary data source to personalize the UI in real-time. Although dynamically changing user interfaces based on emotional states are not yet widely implemented, their advantages and the demand for such systems are evident. This research contributes to the development of emotion-aware applications, particularly in recommendation systems and feedback mechanisms. A case study, "Shresta: Emotion-Based Book Recommendation System," demonstrates the practical implementation of this framework, the technologies employed, and the system's usefulness. Furthermore, a user survey conducted after presenting the working model reveals a strong demand for such adaptive interfaces, emphasizing the importance of user satisfaction and comfort in human-computer interaction. The results showed that nearly 85.7\% of the users found these systems to be very engaging and user-friendly. This study underscores the potential for emotion-driven UI adaptation to improve user experiences across various applications.
Authors:Jiayang Xu, Xiangjie Huang, Zijie Li, Zili Meng
Abstract:
In 2025, Large Language Model (LLM) services have launched a new feature -- AI video chat -- allowing users to interact with AI agents via real-time video communication (RTC), just like chatting with real people. Despite its significance, no systematic study has characterized the performance of existing AI video chat systems. To address this gap, this paper proposes a comprehensive benchmark with carefully designed metrics across four dimensions: quality, latency, internal mechanisms, and system overhead. Using custom testbeds, we further evaluate five mainstream AI video chatbots with this benchmark. This work provides the research community a baseline of real-world performance and identifies unique system bottlenecks. In the meantime, our benchmarking results also open up several research questions for future optimizations of AI video chatbots.
Authors:Chang Han, Andrew Mcnutt
Abstract:
Superstition and religious belief system have historically shaped human behavior, offering powerful psychological motivations and persuasive frameworks to guide actions. Inspired by Feng Shui -- an ancient Chinese superstition -- this paper proposes a pseudo-theoretical framework that integrates superstition-like heuristics into visualization design. Rather than seeking empirical truth, this framework leverages culturally resonant (superstitious) narratives and symbolic metaphors as persuasive tools to encourage desirable design practices, such as clarity, accessibility, and audience-centered thinking. We articulate a set of visualization designs into a Feng Shui compass, reframing empirical design principles and guidelines within an engaing mythology. We present how visualization design principles can be intepreted in Feng Shui narratives, discussing the potential of these metaphorical principles in reducing designer anxiety, fostering community norms, and enhancing the memorability and internalization of visualization design guidelines. Finally, we discuss Feng Shui visualization theory as a set of cognitive shortcuts that can exert persuasive power through playful, belief-like activities.
Authors:Rizul Sharma, Tianyu Jiang, Seokki Lee, Jillian Aurisano
Abstract:
In this short paper, we present work evaluating an AI agent's understanding of spoken conversations about data visualizations in an online meeting scenario. There is growing interest in the development of AI-assistants that support meetings, such as by providing assistance with tasks or summarizing a discussion. The quality of this support depends on a model that understands the conversational dialogue. To evaluate this understanding, we introduce a dual-axis testing framework for diagnosing the AI agent's comprehension of spoken conversations about data. Using this framework, we designed a series of tests to evaluate understanding of a novel corpus of 72 spoken conversational dialogues about data visualizations. We examine diverse pipelines and model architectures, LLM vs VLM, and diverse input formats for visualizations (the chart image, its underlying source code, or a hybrid of both) to see how this affects model performance on our tests. Using our evaluation methods, we found that text-only input modalities achieved the best performance (96%) in understanding discussions of visualizations in online meetings.
Authors:Rafael da Silva Maciel, Lucio Veraldo
Abstract:
The evolution of the 5S methodology with the support of artificial intelligence techniques represents a significant opportunity to improve industrial organization audits in the automotive chain, making them more objective, efficient and aligned with Industry 4.0 standards. This work developed an automated 5S audit system based on large-scale language models (LLM), capable of assessing the five senses (Seiri, Seiton, Seiso, Seiketsu, Shitsuke) in a standardized way through intelligent image analysis. The system's reliability was validated using Cohen's concordance coefficient (kappa = 0.75), showing strong alignment between the automated assessments and the corresponding human audits. The results indicate that the proposed solution contributes significantly to continuous improvement in automotive manufacturing environments, speeding up the audit process by 50% of the traditional time and maintaining the consistency of the assessments, with a 99.8% reduction in operating costs compared to traditional manual audits. The methodology presented establishes a new paradigm for integrating lean systems with emerging AI technologies, offering scalability for implementation in automotive plants of different sizes.
Authors:Bertie Vidgen, Abby Fennelly, Evan Pinnix, Chirag Mahapatra, Zach Richards, Austin Bridges, Calix Huang, Ben Hunsberger, Fez Zafar, Brendan Foody, Dominic Barton, Cass R. Sunstein, Eric Topol, Osvald Nitski
Abstract:
We introduce the first version of the AI Productivity Index (APEX), a benchmark for assessing whether frontier AI models can perform knowledge work with high economic value. APEX addresses one of the largest inefficiencies in AI research: outside of coding, benchmarks often fail to test economically relevant capabilities. APEX-v1.0 contains 200 test cases and covers four domains: investment banking, management consulting, law, and primary medical care. It was built in three steps. First, we sourced experts with top-tier experience e.g., investment bankers from Goldman Sachs. Second, experts created prompts that reflect high-value tasks in their day-to-day work. Third, experts created rubrics for evaluating model responses. We evaluate 23 frontier models on APEX-v1.0 using an LM judge. GPT 5 (Thinking = High) achieves the highest mean score (64.2%), followed by Grok 4 (61.3%) and Gemini 2.5 Flash (Thinking = On) (60.4%). Qwen 3 235B is the best performing open-source model and seventh best overall. There is a large gap between the performance of even the best models and human experts, highlighting the need for better measurement of models' ability to produce economically valuable work.
Authors:Tianwen Zhou, Akshay Paruchuri, Josef Spjut, Kaan Akşit
Abstract:
Camera-based physiological signal estimation provides a non-contact and convenient means to monitor Heart Rate (HR). However, the presence of vital signals in facial videos raises significant privacy concerns, as they can reveal sensitive personal information related to the health and emotional states of an individual. To address this, we propose a learned framework that edits physiological signals in videos while preserving visual fidelity. First, we encode an input video into a latent space via a pretrained 3D Variational Autoencoder (3D VAE), while a target HR prompt is embedded through a frozen text encoder. We fuse them using a set of trainable spatio-temporal layers with Adaptive Layer Normalizations (AdaLN) to capture the strong temporal coherence of remote Photoplethysmography (rPPG) signals. We apply Feature-wise Linear Modulation (FiLM) in the decoder with a fine-tuned output layer to avoid the degradation of physiological signals during reconstruction, enabling accurate physiological modulation in the reconstructed video. Empirical results show that our method preserves visual quality with an average PSNR of 38.96 dB and SSIM of 0.98 on selected datasets, while achieving an average HR modulation error of 10.00 bpm MAE and 10.09% MAPE using a state-of-the-art rPPG estimator. Our design's controllable HR editing is useful for applications such as anonymizing biometric signals in real videos or synthesizing realistic videos with desired vital signs.
Authors:Tianqiang Yan, Ziqiao Lin, Sicheng Wang, Tianwei Zhang, Zhenglong Sun
Abstract:
The emergence of large pre-trained models based on natural language has breathed new life into robotics development. Extensive research has integrated large models with robots, utilizing the powerful semantic understanding and generation capabilities of large models to facilitate robot control through natural language instructions gradually. However, we found that robots that strictly adhere to human instructions, especially those containing misleading information, may encounter errors during task execution, potentially leading to safety hazards. This resembles the concept of counterfactuals in natural language processing (NLP), which has not yet attracted much attention in robotic research. In an effort to highlight this issue for future studies, this paper introduced directive counterfactuals (DCFs) arising from misleading human directives. We present DynaMIC, a framework for generating robot task flows to identify DCFs and relay feedback to humans proactively. This capability can help robots be sensitive to potential DCFs within a task, thus enhancing the reliability of the execution process. We conducted semantic-level experiments and ablation studies, showcasing the effectiveness of this framework.
Authors:Xinyu Pi, Qisen Yang, Chuong Nguyen
Abstract:
Grounded theory offers deep insights from qualitative data, but its reliance on expert-intensive manual coding presents a major scalability bottleneck. Current computational tools stop short of true automation, keeping researchers firmly in the loop. We introduce LOGOS, a novel, end-to-end framework that fully automates the grounded theory workflow, transforming raw text into a structured, hierarchical theory. LOGOS integrates LLM-driven coding, semantic clustering, graph reasoning, and a novel iterative refinement process to build highly reusable codebooks. To ensure fair comparison, we also introduce a principled 5-dimensional metric and a train-test split protocol for standardized, unbiased evaluation. Across five diverse corpora, LOGOS consistently outperforms strong baselines and achieves a remarkable $88.2\%$ alignment with an expert-developed schema on a complex dataset. LOGOS demonstrates a powerful new path to democratize and scale qualitative research without sacrificing theoretical nuance.
Authors:Kaiang Wen, Mark Roman Miller
Abstract:
As virtual reality (VR) and augmented reality (AR) continue to gain popularity, head and hand motion data captured by consumer VR systems have become ubiquitous. Prior work shows that such telemetry can be highly identifying and reflect broad user traits, often aligning with intuitive "folk theories" of body language. However, it remains unclear to what extent motion kinematics encode more nuanced cognitive states, such as confusion, hesitation, and readiness, which lack clear correlates with motion. To investigate this, we introduce a novel dataset of head and hand motion with frame-level annotations of these states collected during structured decision-making tasks. Our findings suggest that deep temporal models can infer subtle cognitive states from motion alone, achieving comparable performance with human observers. This work demonstrates that standard VR telemetry contains strong patterns related to users' internal cognitive processes, which opens the door for a new generation of adaptive virtual environments. To enhance reproducibility and support future work, we will make our dataset and modeling framework publicly available.
Authors:Edward Kim, Daniel He, Jorge Chao, Wiktor Rajca, Mohammed Amin, Nishant Malpani, Ruta Desai, Antti Oulasvirta, Bjoern Hartmann, Sanjit Seshia
Abstract:
Teaching systems physical tasks is a long standing goal in HCI, yet most prior work has focused on non collaborative physical activities. Collaborative tasks introduce added complexity, requiring systems to infer users assumptions about their teammates intent, which is an inherently ambiguous and dynamic process. This necessitates representations that are interpretable and correctable, enabling users to inspect and refine system behavior. We address this challenge by framing collaborative task learning as a program synthesis problem. Our system represents behavior as editable programs and uses narrated demonstrations, i.e. paired physical actions and natural language, as a unified modality for teaching, inspecting, and correcting system logic without requiring users to see or write code. The same modality is used for the system to communicate its learning to users. In a within subjects study, 20 users taught multiplayer soccer tactics to our system. 70 percent (14/20) of participants successfully refined learned programs to match their intent and 90 percent (18/20) found it easy to correct the programs. The study surfaced unique challenges in representing learning as programs and in enabling users to teach collaborative physical activities. We discuss these issues and outline mitigation strategies.
Authors:Ana PatrÃcia Pinto, Rute Bettencourt, Urbano J. Nunes, Gabriel Pires
Abstract:
Patients with Amyotrophic Lateral Sclerosis (ALS) progressively lose voluntary motor control, often leading to a Locked-In State (LIS), or in severe cases, a Completely Locked-in State (CLIS). Eye-tracking (ET) systems are common communication tools in early LIS but become ineffective as oculomotor function declines. EEG-based Brain-Computer Interfaces (BCIs) offer a non-muscular communication alternative, but delayed adoption may reduce performance due to diminished goal-directed thinking. This study presents a preliminary hybrid BCI framework combining ET and BCI to support a gradual transition between modalities. A group of five healthy participants tested a modified P300-based BCI. Gaze and EEG data were processed in real time, and an ET-BCI fusion algorithm was proposed to enhance detection of user intention. Results indicate that combining both modalities may maintain high accuracy and offers insights on how to potentially improve communication continuity for patients transitioning from LIS to CLIS.
Authors:Bruno M. Henrique, Eugene Santos
Abstract:
Trust calibration between humans and Artificial Intelligence (AI) is crucial for optimal decision-making in collaborative settings. Excessive trust can lead users to accept AI-generated outputs without question, overlooking critical flaws, while insufficient trust may result in disregarding valuable insights from AI systems, hindering performance. Despite its importance, there is currently no definitive and objective method for measuring trust calibration between humans and AI. Current approaches lack standardization and consistent metrics that can be broadly applied across various contexts, and they don't distinguish between the formation of opinions and subsequent human decisions. In this work, we propose a novel and objective method for dynamic trust calibration, introducing a standardized trust calibration measure and an indicator. By utilizing Contextual Bandits-an adaptive algorithm that incorporates context into decision-making-our indicator dynamically assesses when to trust AI contributions based on learned contextual information. We evaluate this indicator across three diverse datasets, demonstrating that effective trust calibration results in significant improvements in decision-making performance, as evidenced by 10 to 38% increase in reward metrics. These findings not only enhance theoretical understanding but also provide practical guidance for developing more trustworthy AI systems supporting decisions in critical domains, for example, disease diagnoses and criminal justice.
Authors:Issam Hosni, Omar Talbi
Abstract:
Crowdfunding has emerged as a vital alternative funding source, transforming how creative projects and startups secure financing by directly connecting creators to backers. However, persistent trust issues and information asymmetry between creators and backers significantly hinder its growth and development. Existing trust-enhancement mechanisms, such as third-party endorsements and basic expert validation often lack objectivity and robustness, leaving backers vulnerable to biased signals and project failures. This paper addresses these limitations by introducing a novel trust-enhancement mechanism, referred to as Double-Score Voting. This approach refines expert validation systems by integrating two critical dimensions: firstly, a granular score-based vote from experts on a project's potential, moving beyond simple binary approval; and secondly, a weighted score representing the expert's credibility and level of expertise. This dual-layered evaluation provides a more nuanced, objective, and reliable assessment of project viability. The mechanism is formalised mathematically, and its practical implementation is demonstrated through CertiFund, a prototype crowdfunding platform developed to test and validate the concept. The findings of this study demonstrate that the Double-Score Voting mechanism can significantly mitigate information asymmetry, thereby increasing the credibility of projects and fostering a more trustworthy ecosystem for both creators and backers.
Authors:Anthony Savidis, Christos Vasilopoulos
Abstract:
Software visualization seeks to represent software artifacts graphical-ly in two or three dimensions, with the goal of enhancing comprehension, anal-ysis, maintenance, and evolution of the source code. In this context, visualiza-tions employ graphical forms such as dependency structures, treemaps, or time-lines that incorporate repository histories. These visualizations allow software engineers to identify structural patterns, detect complexity hotspots, and infer system behaviors that are difficult to perceive directly from source text. By adopting metaphor-based approaches, visualization tools provide macroscopic overviews while enabling focused inspection of specific program elements, thus offering an accessible means of understanding large-scale systems. The contri-bution of our work lies in three areas. First, we introduce a configurable group-ing mechanism that supports flexible organization of code elements based on arbitrary relationships. Second, we combine fine-grained and coarse-grained software metrics to provide a multi-level perspective on system properties. Third, we present an interactive visualization engine that allows developers to dynamically adjust rendering attributes. Collectively, these advances provide a more adaptable and insightful approach to source code comprehension.
Authors:Rayhan Rashed, Farnaz Jahanbakhsh
Abstract:
Centralized content moderation paradigm both falls short and over-reaches: 1) it fails to account for the subjective nature of harm, and 2) it acts with blunt suppression in response to content deemed harmful, even when such content can be salvaged. We first investigate this through formative interviews, documenting how seemingly benign content becomes harmful due to individual life experiences. Based on these insights, we developed DIY-MOD, a browser extension that operationalizes a new paradigm: personalized content transformation. Operating on a user's own definition of harm, DIY-MOD transforms sensitive elements within content in real-time instead of suppressing the content itself. The system selects the most appropriate transformation for a piece of content from a diverse palette--from obfuscation to artistic stylizing--to match the user's specific needs while preserving the content's informational value. Our two-session user study demonstrates that this approach increases users' sense of agency and safety, enabling them to engage with content and communities they previously needed to avoid.
Authors:Vivienne Jia Zhong, Theresa Schmiedel
Abstract:
The integration of socially assistive robots (SARs) in elder care settings has the potential to address critical labor shortages while enhancing the quality of care. However, the design of SARs must align with the values of various stakeholders to ensure their acceptance and efficacy. This study empirically investigates the values that should be embedded in SARs from a multi-stakeholder perspective, including care receivers, caregivers, therapists, relatives, and other involved parties. Utilizing a combination of semi-structured interviews and focus groups, we identify a wide range of values related to safety, trust, care, privacy, and autonomy, and illustrate how stakeholders interpret these values in real-world care environments. Our findings reveal several value tensions and propose potential resolutions to these tensions. Additionally, the study highlights under-researched values such as calmness and collaboration, which are critical in fostering a supportive and efficient care environment. Our work contributes to the understanding of value-sensitive design of SARs and aids practitioners in developing SARs that align with human values, ultimately promoting socially responsible applications in elder care settings.
Authors:Seoyoung Lee, Seonbin Yoon, Seongbeen Lee, Hyesoo Kim, Joo Yong Sim
Abstract:
GUI task automation streamlines repetitive tasks, but existing LLM or VLM-based planner-executor agents suffer from brittle generalization, high latency, and limited long-horizon coherence. Their reliance on single-shot reasoning or static plans makes them fragile under UI changes or complex tasks. Log2Plan addresses these limitations by combining a structured two-level planning framework with a task mining approach over user behavior logs, enabling robust and adaptable GUI automation. Log2Plan constructs high-level plans by mapping user commands to a structured task dictionary, enabling consistent and generalizable automation. To support personalization and reuse, it employs a task mining approach from user behavior logs that identifies user-specific patterns. These high-level plans are then grounded into low-level action sequences by interpreting real-time GUI context, ensuring robust execution across varying interfaces. We evaluated Log2Plan on 200 real-world tasks, demonstrating significant improvements in task success rate and execution time. Notably, it maintains over 60.0% success rate even on long-horizon task sequences, highlighting its robustness in complex, multi-step workflows.
Authors:Saurav Jha, Stefan K. Ehrlich
Abstract:
Healthcare robotics requires robust multimodal perception and reasoning to ensure safety in dynamic clinical environments. Current Vision-Language Models (VLMs) demonstrate strong general-purpose capabilities but remain limited in temporal reasoning, uncertainty estimation, and structured outputs needed for robotic planning. We present a lightweight agentic multimodal framework for video-based scene understanding. Combining the Qwen2.5-VL-3B-Instruct model with a SmolAgent-based orchestration layer, it supports chain-of-thought reasoning, speech-vision fusion, and dynamic tool invocation. The framework generates structured scene graphs and leverages a hybrid retrieval module for interpretable and adaptive reasoning. Evaluations on the Video-MME benchmark and a custom clinical dataset show competitive accuracy and improved robustness compared to state-of-the-art VLMs, demonstrating its potential for applications in robot-assisted surgery, patient monitoring, and decision support.
Authors:Pawel Weichbroth, Tomasz Szot
Abstract:
Nowadays, mobile applications are essential tools for everyday life, providing users with anytime, anywhere access to up-to-date information, communication, and entertainment. Needless to say, hardware limitations and the diverse needs of different user groups pose a number of design and development challenges. According to recent studies, usability is one of the most revealing among many others. However, few have made the direct effort to provide and discuss what countermeasures can be applied to avoid usability issues in mobile application development. Through a survey of 20 mobile software design and development practitioners, this study aims to fill this research gap. Given the qualitative nature of the data collected, and with the goal of capturing and preserving the intrinsic meanings embedded in the experts' statements, we adopted in vivo coding. The analysis of the collected material enabled us to develop a novel framework consisting of ten guidelines and three activities with general applications. In addition, it can be noted that active collaboration with users in testing and collecting feedback was often emphasized at each stage of mobile application development. Future research should consider focused action research that evaluates the effectiveness of our recommendations and validates them across different stakeholder groups. In this regard, the development of automated tools to support early detection and mitigation of usability issues during mobile application development could also be considered.
Authors:Yuanning Han, Ziyi Qiu, Jiale Cheng, RAY LC
Abstract:
Studies of Generative AI (GenAI)-assisted creative workflows have focused on individuals overcoming challenges of prompting to produce what they envisioned. When designers work in teams, how do collaboration and prompting influence each other, and how do users perceive generative AI and their collaborators during the co-prompting process? We engaged students with design or performance backgrounds, and little exposure to GenAI, to work in pairs with GenAI to create stage designs based on a creative theme. We found two patterns of collaborative prompting focused on generating story descriptions first, or visual imagery first. GenAI tools helped participants build consensus in the task, and allowed for discussion of the prompting strategies. Participants perceived GenAI as efficient tools rather than true collaborators, suggesting that human partners reduced the reliance on their use. This work highlights the importance of human-human collaboration when working with GenAI tools, suggesting systems that take advantage of shared human expertise in the prompting process.
Authors:Lihua Du, Xing Lyu, Lezi Xie, Bo Feng
Abstract:
AI sycophancy is increasingly recognized as a harmful alignment, but research remains fragmented and underdeveloped at the conceptual level. This article redefines AI sycophancy as the tendency of large language models (LLMs) and other interactive AI systems to excessively and/or uncritically validate, amplify, or align with a user's assertions-whether these concern factual information, cognitive evaluations, or affective states. Within this framework, we distinguish three types of sycophancy: informational, cognitive, and affective. We also introduce personalization at the message level and critical prompting at the conversation level as key dimensions for distinguishing and examining different manifestations of AI sycophancy. Finally, we propose the AI Sycophancy Processing Model (AISPM) to examine the antecedents, outcomes, and psychological mechanisms through which sycophantic AI responses shape user experiences. By embedding AI sycophancy in the broader landscape of communication theory and research, this article seeks to unify perspectives, clarify conceptual boundaries, and provide a foundation for systematic, theory-driven investigations.
Authors:Ahmed Jaber, Wangshu Zhu, Karthick Jayavelu, Justin Downes, Sameer Mohamed, Candace Agonafir, Linnia Hawkins, Tian Zheng
Abstract:
Climate data science faces persistent barriers stemming from the fragmented nature of data sources, heterogeneous formats, and the steep technical expertise required to identify, acquire, and process datasets. These challenges limit participation, slow discovery, and reduce the reproducibility of scientific workflows. In this paper, we present a proof of concept for addressing these barriers through the integration of a curated knowledge graph (KG) with AI agents designed for cloud-native scientific workflows. The KG provides a unifying layer that organizes datasets, tools, and workflows, while AI agents -- powered by generative AI services -- enable natural language interaction, automated data access, and streamlined analysis. Together, these components drastically lower the technical threshold for engaging in climate data science, enabling non-specialist users to identify and analyze relevant datasets. By leveraging existing cloud-ready API data portals, we demonstrate that "a knowledge graph is all you need" to unlock scalable and agentic workflows for scientific inquiry. The open-source design of our system further supports community contributions, ensuring that the KG and associated tools can evolve as a shared commons. Our results illustrate a pathway toward democratizing access to climate data and establishing a reproducible, extensible framework for human--AI collaboration in scientific research.
Authors:Jianan Zhou, Fleur Corbett, Joori Byun, Talya Porat, Nejra van Zalk
Abstract:
Interactive intelligent agents are being integrated across society. Despite achieving human-like capabilities, humans' responses to these agents remain poorly understood, with research fragmented across disciplines. We conducted a first systematic synthesis comparing a range of psychological and behavioural responses in matched human-agent vs. human-human dyadic interactions. A total of 162 eligible studies (146 contributed to the meta-analysis; 468 effect sizes) were included in the systematic review and meta-analysis, which integrated frequentist and Bayesian approaches. Our results indicate that individuals exhibited less prosocial behaviour and moral engagement when interacting with agents vs. humans. They attributed less agency and responsibility to agents, perceiving them as less competent, likeable, and socially present. In contrast, individuals' social alignment (i.e., alignment or adaptation of internal states and behaviours with partners), trust in partners, personal agency, task performance, and interaction experiences were generally comparable when interacting with agents vs. humans. We observed high effect-size heterogeneity for many subjective responses (i.e., social perceptions of partners, subjective trust, and interaction experiences), suggesting context-dependency of partner effects. By examining the characteristics of studies, participants, partners, interaction scenarios, and response measures, we also identified several moderators shaping partner effects. Overall, functional behaviours and interactive experiences with agents can resemble those with humans, whereas fundamental social attributions and moral/prosocial concerns lag in human-agent interactions. Agents are thus afforded instrumental value on par with humans but lack comparable intrinsic value, providing practical implications for agent design and regulation.
Authors:Adrian Kuenzler, Stefan Schmid
Abstract:
Large language models (LLMs) are increasingly central to many applications, raising concerns about bias, fairness, and regulatory compliance. This paper reviews risks of biased outputs and their societal impact, focusing on frameworks like the EU's AI Act and the Digital Services Act. We argue that beyond constant regulation, stronger attention to competition and design governance is needed to ensure fair, trustworthy AI. This is a preprint of the Communications of the ACM article of the same title.
Authors:Samangi Wadinambiarachchi, Jenny Waycott, Yvonne Rogers, Greg Wadley
Abstract:
As designers become familiar with Generative AI, a new concept is emerging: Agentic AI. While generative AI produces output in response to prompts, agentic AI systems promise to perform mundane tasks autonomously, potentially freeing designers to focus on what they love: being creative. But how do designers feel about integrating agentic AI systems into their workflows? Through design fiction, we investigated how designers want to interact with a collaborative agentic AI platform. Ten professional designers imagined and discussed collaborating with an AI agent to organise inspiration sources and ideate. Our findings highlight the roles AI agents can play in supporting designers, the division of authority between humans and AI, and how designers' intent can be explained to AI agents beyond prompts. We synthesise our findings into a conceptual framework that identifies authority distribution among humans and AI agents and discuss directions for utilising AI agents in future design workflows.
Authors:Jiazheng Sun, Te Yang, Jiayang Niu, Mingxuan Li, Yongyong Lu, Ruimeng Yang, Xin Peng
Abstract:
Large multi-modal models (LMMs) have advanced mobile GUI agents. However, existing methods struggle with real-world scenarios involving diverse app interfaces and evolving user needs. End-to-end methods relying on model's commonsense often fail on long-tail apps, and agents without user interaction act unilaterally, harming user experience. To address these limitations, we propose Fairy, an interactive multi-agent mobile assistant capable of continuously accumulating app knowledge and self-evolving during usage. Fairy enables cross-app collaboration, interactive execution, and continual learning through three core modules:(i) a Global Task Planner that decomposes user tasks into sub-tasks from a cross-app view; (ii) an App-Level Executor that refines sub-tasks into steps and actions based on long- and short-term memory, achieving precise execution and user interaction via four core agents operating in dual loops; and (iii) a Self-Learner that consolidates execution experience into App Map and Tricks. To evaluate Fairy, we introduce RealMobile-Eval, a real-world benchmark with a comprehensive metric suite, and LMM-based agents for automated scoring. Experiments show that Fairy with GPT-4o backbone outperforms the previous SoTA by improving user requirement completion by 33.7% and reducing redundant steps by 58.5%, showing the effectiveness of its interaction and self-learning.
Authors:Oluwole Adewusi, Wallace S. Msagusa, Jean Pierre Imanirumva, Okemawo Obadofin, Jema D. Ndibwile
Abstract:
The rapid adoption of Mobile Money Services (MMS) in Sub-Saharan Africa (SSA) offers a viable path to improve e-Government service accessibility in the face of persistent low internet penetration. However, existing Mobile Money Authentication (MMA) methods face critical limitations, including susceptibility to SIM swapping, weak session protection, and poor scalability during peak demand. This study introduces a hybrid MMA framework that combines Unstructured Supplementary Service Data (USSD)-based multi-factor authentication with secure session management via cryptographically bound JSON Web Tokens (JWT). Unlike traditional MMA systems that rely solely on SIM-PIN verification or smartphone-dependent biometrics, our design implements a three-factor authentication model; SIM verification, PIN entry, and session token binding, tailored for resource-constrained environments. Simulations and comparative analysis against OAuth-based Single Sign-On (SSO) methods reveal a 45% faster authentication time (8 seconds vs. 12 to 15 seconds), 15% higher success under poor network conditions (95% vs. 80%), and increased resistance to phishing and brute-force attacks. Penetration testing and threat modeling further demonstrate a substantial reduction in vulnerability exposure compared to conventional approaches. The primary contributions of this work are: (1) a hybrid authentication protocol that ensures offline accessibility and secure session continuity; (2) a tailored security framework addressing threats like SIM swapping and social engineering in SSA; and (3) demonstrated scalability for thousands of users with reduced infrastructure overhead. The proposed approach advances secure digital inclusion in SSA and other regions with similar constraints.
Authors:Fadjimata I Anaroua, Qing Li, Yan Tang, Hong P. Liu
Abstract:
This paper presents VITA (Virtual Teaching Assistants), an adaptive distributed learning (ADL) platform that embeds a large language model (LLM)-powered chatbot (BotCaptain) to provide dialogic support, interoperable analytics, and integrity-aware assessment for workforce preparation in data science. The platform couples context-aware conversational tutoring with formative-assessment patterns designed to promote reflective reasoning. The paper describes an end-to-end data pipeline that transforms chat logs into Experience API (xAPI) statements, instructor dashboards that surface outliers for just-in-time intervention, and an adaptive pathway engine that routes learners among progression, reinforcement, and remediation content. The paper also benchmarks VITA conceptually against emerging tutoring architectures, including retrieval-augmented generation (RAG)--based assistants and Learning Tools Interoperability (LTI)--integrated hubs, highlighting trade-offs among content grounding, interoperability, and deployment complexity. Contributions include a reusable architecture for interoperable conversational analytics, a catalog of patterns for integrity-preserving formative assessment, and a practical blueprint for integrating adaptive pathways into data-science courses. The paper concludes with implementation lessons and a roadmap (RAG integration, hallucination mitigation, and LTI~1.3 / OpenID Connect) to guide multi-course evaluations and broader adoption. In light of growing demand and scalability constraints in traditional instruction, the approach illustrates how conversational AI can support engagement, timely feedback, and personalized learning at scale. Future work will refine the platform's adaptive intelligence and examine applicability across varied educational settings.
Authors:Alexander Rind, Julia Boeck
Abstract:
Social workers need visual tools to collect information about their client's life situation, so that they can reflect it together and choose tailored interventions. easyNWK and easyBiograph are two visual tools for the client's social network and life history. We recently redesigned both tools in a participatory design project with social work faculty and professionals. In this short paper we discuss these tools from perspective of input visualization systems.
Authors:Jatin Agarwala, George Paul, Nemani Harsha Vardhan, Vinoo Alluri
Abstract:
Music engagement spans diverse interactions with music, from selection and emotional response to its impact on behavior, identity, and social connections. Social media platforms provide spaces where such engagement can be observed in natural, unprompted conversations. Advances in natural language processing (NLP) and big data analytics make it possible to analyze these discussions at scale, extending music research to broader contexts. Reddit, in particular, offers anonymity that encourages diverse participation and yields rich discourse on music in ecological settings. Yet the scale of this data requires tools to extract, process, and analyze it effectively. We present Muse-it, a platform that retrieves comprehensive Reddit data centered on user-defined queries. It aggregates posts from across subreddits, supports topic modeling, temporal trend analysis, and clustering, and enables efficient study of large-scale discourse. Muse-it also identifies music-related hyperlinks (e.g., Spotify), retrieves track-level metadata such as artist, album, release date, genre, popularity, and lyrics, and links these to the discussions. An interactive interface provides dynamic visualizations of the collected data. Muse-it thus offers an accessible way for music researchers to gather and analyze big data, opening new avenues for understanding music engagement as it naturally unfolds online.
Authors:Eleftherios Papadopoulos, Yagmur Güçlütürk
Abstract:
Visual impairments present significant challenges to individuals worldwide, impacting daily activities and quality of life. Visual neuroprosthetics offer a promising solution, leveraging advancements in technology to provide a simplified visual sense through devices comprising cameras, computers, and implanted electrodes. This study investigates user-centered design principles for a phosphene vision algorithm, utilizing feedback from visually impaired individuals to guide the development of a gaze-controlled semantic segmentation system. We conducted interviews revealing key design principles. These principles informed the implementation of a gaze-guided semantic segmentation algorithm using the Segment Anything Model (SAM). In a simulated phosphene vision environment, participants performed object detection tasks under SAM, edge detection, and normal vision conditions. SAM improved identification accuracy over edge detection, remained effective in complex scenes, and was particularly robust for specific object shapes. These findings demonstrate the value of user feedback and the potential of gaze-guided semantic segmentation to enhance neuroprosthetic vision.
Authors:Seongsil Heo, Roberto Manduchi
Abstract:
Understanding user intent during magnified reading is critical for accessible interface design. Yet magnification collapses visual context and forces continual viewport dragging, producing fragmented, noisy gaze and obscuring reading intent. We present a semi-supervised framework that learns intention-aware gaze representations by leveraging mouse trajectories as weak supervision. The model is first pretrained to predict mouse velocity from unlabeled gaze, then fine-tuned to classify reading versus scanning. To address magnification-induced distortions, we jointly model raw gaze within the magnified viewport and a compensated view remapped to the original screen, which restores spatial continuity across lines and paragraphs. Across text and webpage datasets, our approach consistently outperforms supervised baselines, with semi-supervised pretraining yielding up to 7.5% F1 improvement in challenging settings. These findings highlight the value of behavior-driven pretraining for robust, gaze-only interaction, paving the way for adaptive, hands-free accessibility tools.
Authors:Kefan Xu, Rosa I. Arriaga
Abstract:
The sedentary lifestyle increases individuals' risks of developing chronic diseases. To support individuals to be more physically active, we propose a mobile system, MotionShift, that presents users with step count data alongside contextual information (e.g., location, weather, calendar events, etc.) and self-reported records. By implementing and deploying this system, we aim to understand how contextual information impacts individuals' sense-making on sensor-captured data and how individuals leverage contextualized data to identify and reduce sedentary activities. The findings will advance the design of context-aware personal informatics systems, empowering users to derive actionable insights from sensor data while minimizing interpretation biases, ultimately promoting opportunities to be more physically active.
Authors:Muhammad Kaif Laghari, Areeb Ahmed Shaikh, Faiz Khan, Aafia Gul Siddiqui
Abstract:
The adoption of current mixed reality (MR) content creation is primarily based on external PC-centric platforms and third-party cameras, limiting adoption for standalone virtual reality (VR) users. In this work, we investigate the feasibility of integrating an enhanced LIV SDK-like MR compositing pipeline into the Meta Quest 3 hardware, enabling native first-person physical perspective (FPP) MR content creation without external infrastructure. We conducted a simulation-based feasibility study using hardware specifications, developer documentation, and benchmarking with ARM-based SoCs, including Snapdragon 8 Gen 3 and MediaTek Dimensity 9300. The approach suggested Camera Passthrough Enhancement using Meta's experimental Passthrough Camera API with on-device machine learning segmentation through Unity Sentis and FastSAM, and an optimized real-time compositing engine for standalone VR. Benchmarking results show that Quest 3's Snapdragon XR2 Gen 2 can support lightweight native MR compositing at 720p30 resolution using 95\% resource utilization, leaving 5\% thermal headroom for sustained runtime. Comparison with next-generation SoCs such as Snapdragon 8 Gen 3 demonstrates 34\% headroom, enabling more robust MR experiences with 1.5--2x faster CPU/GPU performance and higher memory bandwidth. While current Quest 3 hardware supports basic native MR compositing, thermal limits restrict operation to 5--10 minutes before throttling. Experimental results confirm standalone MR content creation is possible on current hardware for short recordings, with new XR SoCs offering the headroom for extended sessions and improved quality. These findings lay groundwork for transitioning MR content creation from PC-based workflows to all-in-one VR devices, enhancing MR production for content creators and researchers.
Authors:Navya Tiwari, Joseph Vazhaeparampil, Victoria Preston
Abstract:
Uncontrolled intersections account for a significant fraction of roadway crashes due to ambiguous right-of-way rules, occlusions, and unpredictable driver behavior. While autonomous vehicle research has explored uncertainty-aware decision making, few systems exist to retrofit human-operated vehicles with assistive navigation support. We present a driver-assist framework for right-of-way reasoning at uncontrolled intersections, formulated as a Partially Observable Markov Decision Process (POMDP). Using a custom simulation testbed with stochastic traffic agents, pedestrians, occlusions, and adversarial scenarios, we evaluate four decision-making approaches: a deterministic finite state machine (FSM), and three probabilistic planners: QMDP, POMCP, and DESPOT. Results show that probabilistic planners outperform the rule-based baseline, achieving up to 97.5 percent collision-free navigation under partial observability, with POMCP prioritizing safety and DESPOT balancing efficiency and runtime feasibility. Our findings highlight the importance of uncertainty-aware planning for driver assistance and motivate future integration of sensor fusion and environment perception modules for real-time deployment in realistic traffic environments.
Authors:Anoushka Puranik, Ester Chen, Roshan L Peiris, Ha-Kyung Kong
Abstract:
Qualitative research offers deep insights into human experiences, but its processes, such as coding and thematic analysis, are time-intensive and laborious. Recent advancements in qualitative data analysis (QDA) tools have introduced AI capabilities, allowing researchers to handle large datasets and automate labor-intensive tasks. However, qualitative researchers have expressed concerns about AI's lack of contextual understanding and its potential to overshadow the collaborative and interpretive nature of their work. This study investigates researchers' preferences among three degrees of delegation of AI in QDA (human-only, human-initiated, and AI-initiated coding) and explores factors influencing these preferences. Through interviews with 16 qualitative researchers, we identified efficiency, ownership, and trust as essential factors in determining the desired degree of delegation. Our findings highlight researchers' openness to AI as a supportive tool while emphasizing the importance of human oversight and transparency in automation. Based on the results, we discuss three factors of trust in AI for QDA and potential ways to strengthen collaborative efforts in QDA and decrease bias during analysis.
Authors:Riccardo Cadei, Christian Internò
Abstract:
Modern foundational models increasingly reflect not just world knowledge, but patterns of human preference embedded in their training data. We hypothesize that recursive alignment-via human feedback and model-generated corpora-induces a social desirability bias, nudging models to favor agreeable or flattering responses over objective reasoning. We refer to it as the Narcissus Hypothesis and test it across 31 models using standardized personality assessments and a novel Social Desirability Bias score. Results reveal a significant drift toward socially conforming traits, with profound implications for corpus integrity and the reliability of downstream inferences. We then offer a novel epistemological interpretation, tracing how recursive bias may collapse higher-order reasoning down Pearl's Ladder of Causality, culminating in what we refer to as the Rung of Illusion.
Authors:Austin Wilson, Sahar Kapasi, Zane Greene, Alexis E. Block
Abstract:
Many research groups face challenges when legacy (unsupported) robotic platforms lose manufacturer support and cannot accommodate modern sensing, speech, and interaction capabilities. We present the Enhanced NAO, a revitalized version of Aldebaran's NAO robot that uses upgraded microphones, RGB-D and thermal cameras, and additional compute resources in a fully self-contained package. This system combines cloud and local models for perception and dialogue, while preserving the NAO's expressive body and behaviors. In a pilot validation study, the Enhanced NAO delivered significantly higher conversational quality and stronger user preference compared to the NAO AI Edition, without increasing response latency. Key upgrades, such as beamforming microphones and low-latency audio processing, reduced artifacts like self-hearing and improved multi-party separation. Expanded visual and thermal sensing established a foundation for future interaction capabilities. Beyond the NAO, our framework provides a platform-agnostic strategy for extending the lifespan and research utility of legacy robots, ensuring they remain valuable tools for human-robot interaction.
Authors:Zhenghao Wang, Shuo Xiong
Abstract:
In this paper, we establish structural analogies between core concepts in quantum mechanics and games. By constructing the Quantum Coin Toss on a quantum circuit, we preliminarily investigate the similarity between quantum system behavior and game behavior, thereby formulating the state-operation paradigm. Using this paradigm, we introduce the conceptual prototype of the State Space Interpretation (SSI). Based on mathematical and physical theories, particularly linear algebra, quantum mechanics, and statistical mechanics, we define formal constructs including state space, evolution path, and derived concepts. With the SSI, a game is conceptualized as a state space, while a gameplay process corresponds to an evolution path within this space. We propose that the SSI constitutes a novel interpretation framework for game design and game studies. This framework aims to enhance understanding of games and function as a link between game studies and related fields.
Authors:Jiaju Ma, Chau Vu, Asya Lyubavina, Catherine Liu, Jingyi Li
Abstract:
One way illustrators engage in disciplined drawing - the process of drawing to improve technical skills - is through studying and replicating reference images. However, for many novice and intermediate digital artists, knowing how to approach studying a reference image can be challenging. It can also be difficult to receive immediate feedback on their works-in-progress. To help these users develop their professional vision, we propose ArtKrit, a tool that scaffolds the process of replicating a reference image into three main steps: composition, value, and color. At each step, our tool offers computational guidance, such as adaptive composition line generation, and automatic feedback, such as value and color accuracy. Evaluating this tool with intermediate digital artists revealed that ArtKrit could flexibly accommodate their unique workflows. Our code and supplemental materials are available at https://majiaju.io/artkrit .
Authors:Chishang Yang, Xiang Chang, Debargha Dey, Avi Parush, Wendy Ju
Abstract:
Social scientists have argued that autonomous vehicles (AVs) need to act as effective social agents; they have to respond implicitly to other drivers' behaviors as human drivers would. In this paper, we investigate how contingent driving behavior in AVs influences human drivers' experiences. We compared three algorithmic driving models: one trained on human driving data that responds to interactions (a familiar contingent behavior) and two artificial models that intend to either always-yield or never-yield regardless of how the interaction unfolds (non-contingent behaviors). Results show a statistically significant relationship between familiar contingent behavior and positive driver experiences, reducing stress while promoting the decisive interactions that mitigate driver hesitance. The direct relationship between familiar contingency and positive experience indicates that AVs should incorporate socially familiar driving patterns through contextually-adaptive algorithms to improve the chances of successful deployment and acceptance in mixed human-AV traffic environments.
Authors:Haoze Guo, Ziqi Wei
Abstract:
With social media content traversing the different platforms, occasionally resurfacing after periods of time, users are increasingly prone to unintended disclosure resulting from a misremembered acceptance of privacy. Context collapse and interface cues are two factors considered by prior researchers, yet we know less about how time-lapse basically alters recall of past audiences destined for exposure. Likewise, the design space for mitigating this temporal exposure risk remains underexplored. Our work theorizes temporal drift in privacy recall as verbatim memory of prior settings blowing apart and eventually settling with gist-based heuristics, which more often than not select an audience larger than the original one. Grounded in memory research, contextual integrity, and usable privacy, we examine why such a drift occurs, why it tends to bias toward broader sharing, and how it compounds upon repeat exposure. Following that, we suggest provenance-forward interface schemes and a risk-based evaluation framework that mutates recall into recognition. The merit of our work lies in establishing a temporal awareness of privacy design as an essential safety rail against inadvertent overexposure.
Authors:Ettilla Mohiuddin Eumi, Hussein Abbass, Nadine Marcus
Abstract:
Traditional Human-Swarm Interaction (HSI) methods often lack intuitive real-time adaptive interfaces, making decision making slower and increasing cognitive load while limiting command flexibility. To solve this, we present SwarmChat, a context-aware, multimodal interaction system powered by Large Language Models (LLMs). SwarmChat enables users to issue natural language commands to robotic swarms using multiple modalities, such as text, voice, or teleoperation. The system integrates four LLM-based modules: Context Generator, Intent Recognition, Task Planner, and Modality Selector. These modules collaboratively generate context from keywords, detect user intent, adapt commands based on real-time robot state, and suggest optimal communication modalities. Its three-layer architecture offers a dynamic interface with both fixed and customizable command options, supporting flexible control while optimizing cognitive effort. The preliminary evaluation also shows that the SwarmChat's LLM modules provide accurate context interpretation, relevant intent recognition, and effective command delivery, achieving high user satisfaction.
Authors:Mattea Reid, Zuhairah Zainal, Khaing Zin Than, Danielle Chan, Jonathan Chan
Abstract:
Machine learning is gaining significant attention as a diagnostic tool in medical imaging, particularly in the analysis of retinal fundus images. However, this approach is not yet clinically applicable, as it still depends on human validation from a professional. Therefore, we present the design for a mobile application that monitors metrics related to retinal fundus images correlating to age-related conditions. The purpose of this platform is to observe for a change in these metrics over time, offering early insights into potential ocular diseases without explicitly delivering diagnostics. Metrics analysed include vessel tortuosity, as well as signs of glaucoma, retinopathy and macular edema. To evaluate retinopathy grade and risk of macular edema, a model was trained on the Messidor dataset and compared to a similar model trained on the MAPLES-DR dataset. Information from the DeepSeeNet glaucoma detection model, as well as tortuosity calculations, is additionally incorporated to ultimately present a retinal fundus image monitoring platform. As a result, the mobile application permits monitoring of trends or changes in ocular metrics correlated to age-related conditions with regularly uploaded photographs.
Authors:Alicia E. Boyd, Danielle Cummings, Angie Zhang
Abstract:
How do we create ethical and equitable experiences on global platforms? How might UX designers and developers incorporate reflexive practices--a continuous self-evaluation of one's assumptions and biases--to mitigate assumptions and workers' experience? This tutorial will explore ways to build equitable user experiences using gig work platforms as a target use case. With the rise of gig work platforms, the informal digital economy has altered how algorithmic systems manage occasional workers; its questionable assumptions have spread worldwide. Concerns over autonomy, gamification, and worker privacy and safety are amplified as these practices expand worldwide. We will practice reflexive techniques within this context by implementing an equity-focused journey-mapping experience. Journey mapping allows designers to map out the customer experience and identify potential pain points at each step that could hinder the user experience. Using a ride-sharing scenario, participants will be guided through a custom journey map highlighting equitable considerations that can facilitate responsible user experience innovation. NOTE: The tutorial was presented at Fairness, Accountability and Transparency Conference (FAccT '24) in Rio de Janeiro.
Authors:Mohammed Al Owayyed, Adarsh Denga, Willem-Paul Brinkman
Abstract:
Child helpline training often relies on human-led roleplay, which is both time- and resource-consuming. To address this, rule-based interactive agent simulations have been proposed to provide a structured training experience for new counsellors. However, these agents might suffer from limited language understanding and response variety. To overcome these limitations, we present a hybrid interactive agent that integrates Large Language Models (LLMs) into a rule-based Belief-Desire-Intention (BDI) framework, simulating more realistic virtual child chat conversations. This hybrid solution incorporates LLMs into three components: intent recognition, response generation, and a bypass mechanism. We evaluated the system through two studies: a script-based assessment comparing LLM-generated responses to human-crafted responses, and a within-subject experiment (N=37) comparing the LLM-integrated agent with a rule-based version. The first study provided evidence that the three LLM components were non-inferior to human-crafted responses. In the second study, we found credible support for two hypotheses: participants perceived the LLM-integrated agent as more believable and reported more positive attitudes toward it than the rule-based agent. Additionally, although weaker, there was some support for increased engagement (posterior probability = 0.845, 95% HDI [-0.149, 0.465]). Our findings demonstrate the potential of integrating LLMs into rule-based systems, offering a promising direction for more flexible but controlled training systems.
Authors:Lingyu Peng, Chang Ge, Liying Long, Xin Li, Xiao Hu, Pengda Lu, Qingchuan Li, Jiangyue Wu
Abstract:
This artwork presents an interdisciplinary interaction installation that visualizes collective online mourning behavior in China. By focusing on commemorative content posted on Sina Weibo following the deaths of seven prominent Chinese authors, the artwork employs data scraping, natural language processing, and 3D modeling to transform fragmented textual expressions into immersive digital monuments. Through the analysis of word frequencies, topic models, and user engagement metrics, the system constructs a semantic-visual landscape that reflects both authorial legacies and collective memory. This research contributes to the fields of digital humanities, visualization design, and digital memorial architecture by proposing a novel approach for preserving and reactivating collective memory in the digital age.
Authors:Muhammad Hamza, Danish Hamid, Muhammad Tahir Akram
Abstract:
Human-Object Interaction Recognition (HOIR) and user identification play a crucial role in advancing augmented reality (AR)-based personalized assistive technologies. These systems are increasingly being deployed in high-stakes, human-centric environments such as aircraft cockpits, aerospace maintenance, and surgical procedures. This research introduces I2S (Interact2Sign), a multi stage framework designed for unobtrusive user identification through human object interaction recognition, leveraging 3D hand pose analysis in egocentric videos. I2S utilizes handcrafted features extracted from 3D hand poses and per forms sequential feature augmentation: first identifying the object class, followed by HOI recognition, and ultimately, user identification. A comprehensive feature extraction and description process was carried out for 3D hand poses, organizing the extracted features into semantically meaningful categories: Spatial, Frequency, Kinematic, Orientation, and a novel descriptor introduced in this work, the Inter-Hand Spatial Envelope (IHSE). Extensive ablation studies were conducted to determine the most effective combination of features. The optimal configuration achieved an impressive average F1-score of 97.52% for user identification, evaluated on a bimanual object manipulation dataset derived from the ARCTIC and H2O datasets. I2S demonstrates state-of-the-art performance while maintaining a lightweight model size of under 4 MB and a fast inference time of 0.1 seconds. These characteristics make the proposed framework highly suitable for real-time, on-device authentication in security-critical, AR-based systems.
Authors:Devin Lange, Zach Cutler, Maxim Lisnic
Abstract:
Many sophisticated tools exist to help researchers find the academic literature they are searching for, but what about finding work that you aren't looking for? We promote joyful discovery of visualization research through two games (Colon and Authored) available to play now at https://games.vispubs.com. We believe these games provide several benefits to the visualization research community. First, the joyful discovery of visualization research and researchers occurs because these games randomly select authors and publications, thus exposing players to research areas they may not typically engage with. Second, these games were made by visualization researchers for visualization researchers; playing this game, sharing results with friends in person and online, has the potential to strengthen our academic community. Third, games centered around publication authors provide a passive way for academics to gain exposure within the community. Finally, we hope these games are simply fun to play. Try them now at games.vispubs.com.
Authors:Bahare Riahi, Veronica Catete
Abstract:
There is an increasing imperative to integrate programming platforms within AI frameworks to enhance educational tasks for both teachers and students. However, commonly used platforms such as Code.org, Scratch, and Snap fall short of providing the desired AI features and lack adaptability for interdisciplinary applications. This study explores how educational platforms can be improved by incorporating AI and analytics features to create more effective learning environments across various subjects and domains. We interviewed 8 K-12 teachers and asked their practices and needs while using any block-based programming (BBP) platform in their classes. We asked for their approaches in assessment, course development and expansion of resources, and student monitoring in their classes. Thematic analysis of the interview transcripts revealed both commonalities and differences in the AI tools needed between the STEM and non-STEM groups. Our results indicated advanced AI features that could promote BBP platforms. Both groups stressed the need for integrity and plagiarism checks, AI adaptability, customized rubrics, and detailed feedback in assessments. Non-STEM teachers also emphasized the importance of creative assignments and qualitative assessments. Regarding resource development, both AI tools desired for updating curricula, tutoring libraries, and generative AI features. Non-STEM teachers were particularly interested in supporting creative endeavors, such as art simulations. For student monitoring, both groups prioritized desktop control, daily tracking, behavior monitoring, and distraction prevention tools. Our findings identify specific AI-enhanced features needed by K-12 teachers across various disciplines and lay the foundation for creating more efficient, personalized, and engaging educational experiences.
Authors:Martin Lou, Jackie Crowley, Samuel Dodson, Dongwook Yoon
Abstract:
Generative AI is increasingly integrated into writing support, yet current chat-based interfaces often obscure referential context and risk amplifying automation bias and overreliance. We introduce AnchoredAI, a novel system that anchors AI feedback directly to relevant text spans. AnchoredAI implements two key mechanisms: (1) an Anchoring Context Window (ACW) that maintains unique, context-rich references, and (2) an update-aware context retrieval method that preserves the intent of prior comments after document edits. In a controlled user study, we compared AnchoredAI to a chat-based LLM interface. Results show that AnchoredAI led to more targeted revisions while fostering a stronger agency metrics (e.g., control and ownership) among writers. These findings highlight how interface design shapes AI-assisted writing, suggesting that anchoring can mitigate overreliance and enable more precise, user-driven revision practices.
Authors:Michael Faber, Andrey Grishko, Julian Waksberg, David Pardo, Tomer Leivy, Yuval Hazan, Emanuel Talmansky, Benny Megidish, Hadas Erel
Abstract:
Robots come in various forms and have different characteristics that may shape the interaction with them. In human-human interactions, height is a characteristic that shapes human dynamics, with taller people typically perceived as more persuasive. In this work, we aspired to evaluate if the same impact replicates in a human-robot interaction and specifically with a highly non-humanoid robotic object. The robot was designed with modules that could be easily added or removed, allowing us to change its height without altering other design features. To test the impact of the robot's height, we evaluated participants' compliance with its request to volunteer to perform a tedious task. In the experiment, participants performed a cognitive task on a computer, which was framed as the main experiment. When done, they were informed that the experiment was completed. While waiting to receive their credits, the robotic object, designed as a mobile robotic service table, entered the room, carrying a tablet that invited participants to complete a 300-question questionnaire voluntarily. We compared participants' compliance in two conditions: A Short robot composed of two modules and 95cm in height and a Tall robot consisting of three modules and 132cm in height. Our findings revealed higher compliance with the Short robot's request, demonstrating an opposite pattern to human dynamics. We conclude that while height has a substantial social impact on human-robot interactions, it follows a unique pattern of influence. Our findings suggest that designers cannot simply adopt and implement elements from human social dynamics to robots without testing them first.
Authors:Reza Shahriari, Eric D. Ragan
Abstract:
The exchange of personal information in digital environments poses significant risks, including identity theft, privacy breaches, and data misuse. Addressing these challenges requires a deep understanding of user behavior and mental models in diverse contexts. This paper presents a systematic literature review of empirical user studies on unintentional information disclosure in usable security, covering 101 papers published across six leading conferences from 2018 to 2023. The studies are categorized based on methodologies-quantitative and qualitative-and analyzed for their applications in various scenarios. Major subtopics, including data privacy, security in browsers, and privacy tools, are examined to highlight research trends and focal areas. This review provides details on topics and application areas that have received the most research attention. Moreover, by comparing descriptive and experimental approaches, findings aim to guide researchers of strategies to mitigate risks associated with online everyday interaction.
Authors:Zhuoyue Zhang, Haitong Xu
Abstract:
Autonomous navigation in maritime domains is accelerating alongside advances in artificial intelligence, sensing, and connectivity. Opaque decision-making and poorly calibrated human-automation interaction remain key barriers to safe adoption. This article synthesizes 100 studies on automation transparency for Maritime Autonomous Surface Ships (MASS) spanning situation awareness (SA), human factors, interface design, and regulation. We (i) map the Guidance-Navigation-Control stack to shore-based operational modes -- remote supervision (RSM) and remote control (RCM) -- and identify where human unsafe control actions (Human-UCAs) concentrate in handover and emergency loops; (ii) summarize evidence that transparency features (decision rationales, alternatives, confidence/uncertainty, and rule-compliance indicators) improve understanding and support trust calibration, though reliability and predictability often dominate trust; (iii) distill design strategies for transparency at three layers: sensor/SA acquisition and fusion, HMI/eHMI presentation (textual/graphical overlays, color coding, conversational and immersive UIs), and engineer-facing processes (resilient interaction design, validation, and standardization). We integrate methods for Human-UCA identification (STPA-Cog + IDAC), quantitative trust/SA assessment, and operator workload monitoring, and outline regulatory and rule-based implications including COLREGs formalization and route exchange. We conclude with an adaptive transparency framework that couples operator state estimation with explainable decision support to reduce cognitive overload and improve takeover timeliness. The review highlights actionable figure-of-merit displays (e.g., CPA/TCPA risk bars, robustness heatmaps), transparent model outputs (rule traceability, confidence), and training pipelines (HIL/MIL, simulation) as near-term levers for safer MASS operations.
Authors:Emrecan Gulay, Eleonora Picco, Enrico Glerean, Corinna Coupette
Abstract:
When AI systems allow human-like communication, they elicit increasingly complex relational responses. Knowledge workers face a particular challenge: They approach these systems as tools while interacting with them in ways that resemble human social interaction. To understand the relational contexts that arise when humans engage with anthropomorphic conversational agents, we need to expand existing human-computer interaction frameworks. Through three workshops with qualitative researchers, we found that the fundamental ontological and relational ambiguities inherent in anthropomorphic conversational agents make it difficult for individuals to maintain consistent relational stances toward them. Our findings indicate that people's articulated positioning toward such agents often differs from the relational dynamics that occur during interactions. We propose the concept of relational dissonance to help researchers, designers, and policymakers recognize the resulting tensions in the development, deployment, and governance of anthropomorphic conversational agents and address the need for relational transparency.
Authors:Radek Ošlejšek, Radoslav Chudovský, Martin Macak
Abstract:
Hands-on training sessions become a standard way to develop and increase knowledge in cybersecurity. As practical cybersecurity exercises are strongly process-oriented with knowledge-intensive processes, process mining techniques and models can help enhance learning analytics tools. The design of our open-source analytical dashboard is backed by guidelines for visualizing multivariate networks complemented with temporal views and clustering. The design aligns with the requirements for post-training analysis of a special subset of cybersecurity exercises -- supervised Capture the Flag games. Usability is demonstrated in a case study using trainees' engagement measurement to reveal potential flaws in training design or organization.
Authors:Dashiel Carrera, Jeb Thomas-Mitchell, Daniel Wigdor
Abstract:
AI co-writing systems challenge long held ideals about agency and ownership in the creative process, thereby hindering widespread adoption. In order to address this, we investigate conceptions of agency and ownership in AI creative co-writing. Drawing on insights from a review of commercial systems, we developed three co-writing systems with identical functionality but distinct interface metaphors: agentic, tool-like, and magical. Through interviews with professional and non-professional writers (n = 18), we explored how these metaphors influenced participants' sense of control and authorship. Our analysis resulted in a taxonomy of agency and ownership subtypes and underscore how tool-like metaphors shift writers' expected points of control while agentic metaphors foreground conceptual contributions. We argue that interface metaphors not only guide expectations of control but also frame conceptions of authorship. We conclude with recommendations for the design of AI co-writing systems, emphasizing how metaphor shapes user experience and creative practice.
Authors:Yudong Huang, Avneet Singh, Mark Roman Miller
Abstract:
The sense of realism in avatar animation is a widely pursued goal in social VR applications. A common approach to enhancing realism is improving the match between avatar motion and real-world human movement. However, experience with existing VR platforms may reshape users' expectations, suggesting that matching reality is not the only path to enhancing the sense of realism. This study examines how different levels of experience with a social VR platform influence users' criteria for evaluating the realism of avatar animation. Participants were shown a set of animations varying in the degree they reflected real-world motion and motion seen on the social VR platform VRChat. Results showed that users with no VRChat experience found animations recorded on VRChat unnatural and unrealistic, but experienced users in fact rated these animations as more likely to come from a real person than the motion-capture animations. Additionally, highly experienced users recognized the intent to imitate VRChat's style and noted the differences from genuine in-platform animations. All these results suggest users' expectations of and criteria for realistic animation were shaped by their experience level. The findings support the idea that realism in avatar animation does not solely depend on mimicking real-world movement. Experience with VR platforms can shape how users expect, perceive, and evaluate animation realism. This insight can inform the design of more immersive VR environments and virtual humans in the future.
Authors:Ryan S. Yeung, David G. Black, Septimiu E. Salcudean
Abstract:
Teleoperated ultrasound can improve diagnostic medical imaging access for remote communities. Having accurate force feedback is important for enabling sonographers to apply the appropriate probe contact force to optimize ultrasound image quality. However, large time delays in communication make direct force feedback impractical. Prior work investigated using point cloud-based model-mediated teleoperation and internal potential field models to estimate contact forces and torques. We expand on this by introducing a method to update the internal potential field model of the patient with measured positions and forces for more transparent model-mediated tele-ultrasound. We first generate a point cloud model of the patient's surface and transmit this to the sonographer in a compact data structure. This is converted to a static voxelized volume where each voxel contains a potential field value. These values determine the forces and torques, which are rendered based on overlap between the voxelized volume and a point shell model of the ultrasound transducer. We solve for the potential field using a convex quadratic that combines the spatial Laplace operator with measured forces. This was evaluated on volunteer patients ($n=3$) by computing the accuracy of rendered forces. Results showed the addition of measured forces to the model reduced the force magnitude error by an average of 7.23 N and force vector angle error by an average of 9.37$^{\circ}$ compared to using only Laplace's equation.
Authors:Doreen Jirak, Pieter Maes, Armeen Saroukanoff, Dirk van Rooy
Abstract:
As autonomous technologies increasingly shape maritime operations, understanding why an AI system makes a decision becomes as crucial as what it decides. In complex and dynamic maritime environments, trust in AI depends not only on performance but also on transparency and interpretability. This paper highlights the importance of Explainable AI (XAI) as a foundation for effective human-machine teaming in the maritime domain, where informed oversight and shared understanding are essential. To support the user-centered integration of XAI, we propose a domain-specific survey designed to capture maritime professionals' perceptions of trust, usability, and explainability. Our aim is to foster awareness and guide the development of user-centric XAI systems tailored to the needs of seafarers and maritime teams.
Authors:Tenghao Ji, Eytan Adar
Abstract:
Images play a vital role in improving the readability and comprehension of Wikipedia articles by serving as `illustrative aids.' However, not all images are equally effective and not all Wikipedia editors are trained in their selection. We propose QuizRank, a novel method of image selection that leverages large language models (LLMs) and vision language models (VLMs) to rank images as learning interventions. Our approach transforms textual descriptions of the article's subject into multiple-choice questions about important visual characteristics of the concept. We utilize these questions to quiz the VLM: the better an image can help answer questions, the higher it is ranked. To further improve discrimination between visually similar items, we introduce a Contrastive QuizRank that leverages differences in the features of target (e.g., a Western Bluebird) and distractor concepts (e.g., Mountain Bluebird) to generate questions. We demonstrate the potential of VLMs as effective visual evaluators by showing a high congruence with human quiz-takers and an effective discriminative ranking of images.
Authors:Sander de Jong, Rune Møberg Jacobsen, Niels van Berkel
Abstract:
Large language models (LLMs) are increasingly used in group decision-making, but their influence risks fostering conformity and reducing epistemic vigilance. Drawing on the Argumentative Theory of Reasoning, we argue that confirmation bias, often seen as detrimental, can be harnessed as a resource when paired with critical evaluation. We propose a three-step process in which individuals first generate ideas independently, then use LLMs to refine and articulate them, and finally engage with LLMs as epistemic provocateurs to anticipate group critique. This framing positions LLMs as tools for scaffolding disagreement, helping individuals prepare for more productive group discussions.
Authors:Chengjian Xu, Yonghao Song, Zelin Liao, Haochuan Zhang, Qiong Wang, Qingqing Zheng
Abstract:
Decoding visual information from time-resolved brain recordings, such as EEG and MEG, plays a pivotal role in real-time brain-computer interfaces. However, existing approaches primarily focus on direct brain-image feature alignment and are limited to single-task frameworks or task-specific models. In this paper, we propose a Unified MultItask Network for zero-shot M/EEG visual Decoding (referred to UMind), including visual stimulus retrieval, classification, and reconstruction, where multiple tasks mutually enhance each other. Our method learns robust neural-visual and semantic representations through multimodal alignment with both image and text modalities. The integration of both coarse and fine-grained texts enhances the extraction of these neural representations, enabling more detailed semantic and visual decoding. These representations then serve as dual conditional inputs to a pre-trained diffusion model, guiding visual reconstruction from both visual and semantic perspectives. Extensive evaluations on MEG and EEG datasets demonstrate the effectiveness, robustness, and biological plausibility of our approach in capturing spatiotemporal neural dynamics. Our approach sets a multitask pipeline for brain visual decoding, highlighting the synergy of semantic information in visual feature extraction.
Authors:Jorge Garza, Steven Swanson
Abstract:
Within PCB design, the reuse of circuit design blocks is a major preventing factor inhibiting beginners from reusing designs made by experts, a common practice in software but non-existent in circuit design at large. Despite efforts to improve reusability (e.g. block-based PCB design) by platforms such as SparkFun ALC and Altium Upverter, they lack merging techniques that safely guide users in connecting different circuit blocks without requiring assistance from third-party engineers. In this paper, we propose TypedSchematics, a block-based standalone PCB design tool that supports beginners create their own PCBs by providing a language syntax for typing circuit blocks with circuit data that addresses multiple challenges, from real-time detection of connection errors to automated composition and user-scalable libraries of circuit blocks. Through a user study, we demonstrate TypedSchematics improvements in design support for merging circuit blocks compared to Fusion 360. Three PCBs designed with TypedSchematics further showcase our tool capabilities, one designed by high school students demonstrates the potential of TypedSchematics to significantly lower the PCB design skill-floor.
Authors:Pradyumna Shome, Sashreek Krishnan, Sauvik Das
Abstract:
There is growing imprecision about what "AI agents" are, what they can do, and how effectively they can be used by their intended users. We pose two key research questions: (i) How does the tech industry conceive of and market "AI agents"? (ii) What challenges do end-users face when attempting to use commercial AI agents for their advertised uses? We first performed a systematic review of marketed use cases for 102 commercial AI agents, finding that they fall into three umbrella categories: orchestration, creation, and insight. Next, we conducted a usability assessment where N = 31 participants attempted representative tasks for each of these categories on two popular commercial AI agent tools: Operator and Manus. We found that users were generally impressed with these agents but faced several critical usability challenges ranging from agent capabilities that were misaligned with user mental models to agents lacking the meta-cognitive abilities necessary for effective collaboration.
Authors:Russell Beale, Daniel Rutter
Abstract:
This paper presents the development of an interactive system for constructing Augmented Virtual Environments (AVEs) by fusing mobile phone images with open-source geospatial data. By integrating 2D image data with 3D models derived from sources such as OpenStreetMap (OSM) and Digital Terrain Models (DTM), the proposed system generates immersive environments that enhance situational context. The system leverages Python for data processing and Unity for 3D visualization, interconnected via UDP-based two-way communication. Preliminary user evaluation demonstrates that the resulting AVEs accurately represent real-world scenes and improve users' contextual understanding. Key challenges addressed include projector calibration, precise model construction from heterogeneous data, and object detection for dynamic scene representation.
Authors:Julia S. Dollis, Iago A. Brito, Fernanda B. Färber, Pedro S. F. B. Ribeiro, Rafael T. Sousa, Arlindo R. Galvão Filho
Abstract:
While virtual reality (VR) excels at simulating physical environments, its effectiveness for training complex interpersonal skills is limited by a lack of psychologically plausible virtual humans. This is a critical gap in high-stakes domains like medical education, where communication is a core competency. This paper introduces a framework that integrates large language models (LLMs) into immersive VR to create medically coherent virtual patients with distinct, consistent personalities, built on a modular architecture that decouples personality from clinical data. We evaluated our system in a mixed-method, within-subjects study with licensed physicians who engaged in simulated consultations. Results demonstrate that the approach is not only feasible but is also perceived by physicians as a highly rewarding and effective training enhancement. Furthermore, our analysis uncovers critical design principles, including a ``realism-verbosity paradox" where less communicative agents can seem more artificial, and the need for challenges to be perceived as authentic to be instructive. This work provides a validated framework and key insights for developing the next generation of socially intelligent VR training environments.
Authors:Steven Watterson, Sarah Atkinson, Elaine Murray, Andrew McDowell
Abstract:
The arrival of AI tools and in particular Large Language Models (LLMs) has had a transformative impact on teaching and learning and institutes are still trying to determine how to integrate LLMs into education in constructive ways. Here, we explore the adoption of LLM-based tools into two teaching programmes, one undergraduate and one postgraduate. We provided to our classes (1) a LLM-powered chatbot that had access to course materials by RAG and (2) AI-generated audio-only podcasts for each week$\text{'}$s teaching material. At the end of the semester, we surveyed the classes to gauge attitudes towards these tools. The classes were small and from biological courses. The students felt positive about AI generally and that AI tools made a positive impact on teaching. Students found the LLM-powered chatbot easy and enjoyable to use and felt that it enhanced their learning. The podcasts were less popular and only a small proportion of the class listened weekly. The class as a whole was indifferent to whether the podcasts should be used more widely across courses, but those who listened enjoyed them and were in favour.
Authors:Gustavo Kruger, Nikhil Sachdeva, Michael Sobolev
Abstract:
Smartphone usage data can provide valuable insights for understanding interaction with technology and human behavior. However, collecting large-scale, in-the-wild smartphone usage logs is challenging due to high costs, privacy concerns, under representative user samples and biases like non-response that can skew results. These challenges call for exploring alternative approaches to obtain smartphone usage datasets. In this context, large language models (LLMs) such as Open AI's ChatGPT present a novel approach for synthetic smartphone usage data generation, addressing limitations of real-world data collection. We describe a case study on how four prompt strategies influenced the quality of generated smartphone usage data. We contribute with insights on prompt design and measures of data quality, reporting a prompting strategy comparison combining two factors, prompt level of detail (describing a user persona, describing the expected results characteristics) and seed data inclusion (with versus without an initial real usage example). Our findings suggest that using LLMs to generate structured and behaviorally plausible smartphone use datasets is feasible for some use cases, especially when using detailed prompts. Challenges remain in capturing diverse nuances of human behavioral patterns in a single synthetic dataset, and evaluating tradeoffs between data fidelity and diversity, suggesting the need for use-case-specific evaluation metrics and future research with more diverse seed data and different LLM models.
Authors:Xinxu Zhou, Jiaqi Bai, Zhenqi Sun, Fanxiang Zeng, Yue Liu
Abstract:
Although significant progress has been made in many tasks within the field of Natural Language Processing (NLP), Controlled Text Generation (CTG) continues to face numerous challenges, particularly in achieving fine-grained conditional control over generation. Additionally, in real scenario and online applications, cost considerations, scalability, domain knowledge learning and more precise control are required, presenting more challenge for CTG. This paper introduces a novel and scalable framework, AgentCTG, which aims to enhance precise and complex control over the text generation by simulating the control and regulation mechanisms in multi-agent workflows. We explore various collaboration methods among different agents and introduce an auto-prompt module to further enhance the generation effectiveness. AgentCTG achieves state-of-the-art results on multiple public datasets. To validate its effectiveness in practical applications, we propose a new challenging Character-Driven Rewriting task, which aims to convert the original text into new text that conform to specific character profiles and simultaneously preserve the domain knowledge. When applied to online navigation with role-playing, our approach significantly enhances the driving experience through improved content delivery. By optimizing the generation of contextually relevant text, we enable a more immersive interaction within online communities, fostering greater personalization and user engagement.
Authors:Harper Reed, Michael Sugimura, Angelo Zangari
Abstract:
We investigate whether giving LLM agents the collaborative tools and autonomy that humans naturally use for problem solving can improve their performance. We equip Claude Code agents with MCP-based social media and journaling tools and allow them to use these tools as they see fit. Across 34 Aider Polyglot Python programming challenges, collaborative tools substantially improve performance on the hardest problems, delivering 15-40% lower cost, 12-27% fewer turns, and 12-38% faster completion than baseline agents. Effects on the full challenge set are mixed, suggesting these tools act as performance enhancers when additional reasoning scaffolding is most needed. Surprisingly, Different models naturally adopted distinct collaborative strategies without explicit instruction. Sonnet 3.7 engaged broadly across tools and benefited from articulation-based cognitive scaffolding. Sonnet 4 showed selective adoption, leaning on journal-based semantic search when problems were genuinely difficult. This mirrors how human developers adjust collaboration based on expertise and task complexity. Behavioral analysis shows agents prefer writing over reading by about 2-9x, indicating that structured articulation drives much of the improvement rather than information access alone. Overall, AI agents can systematically benefit from human-inspired collaboration tools at the edge of their capabilities, pointing to adaptive collaborative interfaces as reasoning enhancers rather than universal efficiency boosts.
Authors:Priyanka Nanayakkara, Elena Ghazi, Salil Vadhan
Abstract:
Differential privacy (DP) -- a principled approach to producing statistical data products with strong, mathematically provable privacy guarantees for the individuals in the underlying dataset -- has seen substantial adoption in practice over the past decade. Applying DP requires making several implementation decisions, each with significant impacts on data privacy and/or utility. Hence, to promote shared learning and accountability around DP deployments, Dwork, Kohli, and Mulligan (2019) proposed a public-facing repository ("registry") of DP deployments. The DP community has recently started to work toward realizing this vision. We contribute to this effort by (1) developing a holistic, hierarchical schema to describe any given DP deployment and (2) designing and implementing an interactive interface to act as a registry where practitioners can access information about past DP deployments. We (3) populate our interface with 21 real-world DP deployments and (4) conduct an exploratory user study with DP practitioners ($n=16$) to understand how they would use the registry, as well as what challenges and opportunities they foresee around its adoption. We find that participants were enthusiastic about the registry as a valuable resource for evaluating prior deployments and making future deployments. They also identified several opportunities for the registry, including that it can become a "hub" for the community and support broader communication around DP (e.g., to legal teams). At the same time, they identified challenges around the registry gaining adoption, including the effort and risk involved with making implementation choices public and moderating the quality of entries. Based on our findings, we offer recommendations for encouraging adoption and increasing the registry's value not only to DP practitioners, but also to policymakers, data users, and data subjects.
Authors:Louisa Conwill, Megan Levis Scheirer, Walter Scheirer
Abstract:
Subsidiarity is a principle of social organization that promotes human dignity and resists over-centralization by balancing personal autonomy with intervention from higher authorities only when necessary. Thus it is a relevant, but not previously explored, critical lens for discerning the tradeoffs between complete user control of software and surrendering control to "big tech" for convenience, as is common in surveillance capitalism. Our study explores data privacy through the lens of subsidiarity: we employ a multi-method approach of data flow monitoring and user interviews to determine the level of control different everyday technologies currently operate at, and the level of control everyday computer users think is necessary. We found that chat platforms like Slack and Discord violate subsidiarity the most. Our work provides insight into when users are willing to surrender privacy for convenience and demonstrates how subsidiarity can inform designs that promote human flourishing.
Authors:Hemil Mehta, Tanvi Raut, Kohav Yadav, Edward F. Gehringer
Abstract:
This full research-to-practice paper explores approaches for developing course chatbots by comparing low-code platforms and custom-coded solutions in educational contexts. With the rise of Large Language Models (LLMs) like GPT-4 and LLaMA, LLM-based chatbots are being integrated into teaching workflows to automate tasks, provide assistance, and offer scalable support. However, selecting the optimal development strategy requires balancing ease of use, customization, data privacy, and scalability. This study compares two development approaches: low-code platforms like AnythingLLM and Botpress, with custom-coded solutions using LangChain, FAISS, and FastAPI. The research uses Prompt engineering, Retrieval-augmented generation (RAG), and personalization to evaluate chatbot prototypes across technical performance, scalability, and user experience. Findings indicate that while low-code platforms enable rapid prototyping, they face limitations in customization and scaling, while custom-coded systems offer more control but require significant technical expertise. Both approaches successfully implement key research principles such as adaptive feedback loops and conversational continuity. The study provides a framework for selecting the appropriate development strategy based on institutional goals and resources. Future work will focus on hybrid solutions that combine low-code accessibility with modular customization and incorporate multimodal input for intelligent tutoring systems.
Authors:Nadim Barakat, William Lotter
Abstract:
Diabetic retinopathy (DR) is a leading cause of blindness worldwide, and AI systems can expand access to fundus photography screening. Current FDA-cleared systems primarily provide binary referral outputs, where this minimal output may limit clinical trust and utility. Yet, determining the most effective output format to enhance clinician-AI performance is an empirical challenge that is difficult to assess at scale. We evaluated multimodal large language models (MLLMs) for DR detection and their ability to simulate clinical AI assistance across different output types. Two models were tested on IDRiD and Messidor-2: GPT-4o, a general-purpose MLLM, and MedGemma, an open-source medical model. Experiments included: (1) baseline evaluation, (2) simulated AI assistance with synthetic predictions, and (3) actual AI-to-AI collaboration where GPT-4o incorporated MedGemma outputs. MedGemma outperformed GPT-4o at baseline, achieving higher sensitivity and AUROC, while GPT-4o showed near-perfect specificity but low sensitivity. Both models adjusted predictions based on simulated AI inputs, but GPT-4o's performance collapsed with incorrect ones, whereas MedGemma remained more stable. In actual collaboration, GPT-4o achieved strong results when guided by MedGemma's descriptive outputs, even without direct image access (AUROC up to 0.96). These findings suggest MLLMs may improve DR screening pipelines and serve as scalable simulators for studying clinical AI assistance across varying output configurations. Open, lightweight models such as MedGemma may be especially valuable in low-resource settings, while descriptive outputs could enhance explainability and clinician trust in clinical workflows.
Authors:Philipp Proff, Marian Dörk
Abstract:
We present a web-based environment that connects annotation, abstraction, and argumentation during the interpretation of text. As a visual interface for scholarly reading and writing, Textarium combines human analysis with lightweight computational processing to bridge close and distant reading practices. Readers can highlight text, group keywords into concepts, and embed these observations as anchors in essays. The interface renders these interpretive actions as parameterized visualization states. Through a speculative design process of co-creative and iterative prototyping, we developed a reading-writing approach that makes interpretive processes transparent and shareable within digital narratives.
Authors:Mete Harun Akcay, Siddharth Prakash Rao, Alexandros Bakas, Buse Gul Atli
Abstract:
User-generated content, such as photos, comprises the majority of online media content and drives engagement due to the human ability to process visual information quickly. Consequently, many online platforms are designed for sharing visual content, with billions of photos posted daily. However, photos often reveal more than they intended through visible and contextual cues, leading to privacy risks. Previous studies typically treat privacy as a property of the entire image, overlooking individual objects that may carry varying privacy risks and influence how users perceive it. We address this gap with a mixed-methods study (n = 92) to understand how users evaluate the privacy of images containing multiple sensitive objects. Our results reveal mental models and nuanced patterns that uncover how granular details, such as photo-capturing context and co-presence of other objects, affect privacy perceptions. These novel insights could enable personalized, context-aware privacy protection designs on social media and future technologies.
Authors:David Hunter, Pablo Botin, Emily Snode-Brenneman, Amy Stevermer, Becca Hatheway, Dillon Amaya, Eddie Goldstein, Wayne A Seltzer, Mark D Gross, Kris Karnauskas, Daniel Leithinger, Ellen Yi-Luen Do
Abstract:
We describe a multidisciplinary collaboration to iteratively design an interactive exhibit for a public science center on paleoclimate, the study of past climates. We created a data physicalisation of mountains and ice sheets that can be tangibly manipulated by visitors to interact with a wind simulation visualisation that demonstrates how the climate of North America differed dramatically between now and the peak of the last ice age. We detail the system for interaction and visualisation plus design choices to appeal to an audience that ranges from children to scientists and responds to site requirements.
Authors:Yang Chen, Zhijun Zhang
Abstract:
Automation significantly alters human behavior, particularly risk-taking. Previous researches have paid limited attention to the underlying characteristics of automation and their mechanisms of influence on risk-taking. This study investigated how automation affects risk-taking and examined the role of sense of agency therein. By quantifying sense of agency through subjective ratings, this research explored the impact of automation level and reliability level on risk-taking. The results of three experiments indicated that automation reduced the level of risk-taking; higher automation level was associated with lower sense of agency and lower risk-taking, with sense of agency playing a complete mediating role; higher automation reliability was associated with higher sense of agency and higher risk-taking, with sense of agency playing a partial mediating role. The study concludes that automation influences risk-taking, such that higher automation level or lower reliability is associated with a lower likelihood of risk-taking. Sense of agency mediates the impact of automation on risk-taking, and automation level and reliability have different effects on risk-taking.
Authors:Shao-Yu Chu, Yuhe Tian, Yu-Xiang Wang, Haojian Jin
Abstract:
This paper explores how programmers without specialized expertise in differential privacy (DP) (i.e., novices) can leverage LLMs to implement DP programs with minimal training. We first conducted a need-finding study with 6 novices and 3 experts to understand how they utilize LLMs in DP implementation. While DP experts can implement correct DP analyses through a few prompts, novices struggle to articulate their requirements in prompts and lack the skills to verify the correctness of the generated code. We then developed DPCheatSheet, an instructional tool that helps novices implement DP using LLMs. DPCheatSheet combines two learning concepts: it annotates an expert's workflow with LLMs as a worked example to bridge the expert mindset to novices, and it presents five common mistakes in LLM-based DP code generation as erroneous examples to support error-driven learning. We demonstrated the effectiveness of DPCheatSheet with an error identification study and an open-ended DP implementation study.
Authors:T. James Brandt, Cecilia Xi Wang
Abstract:
Generative AI powers a growing wave of companion chatbots, yet principles for fostering genuine connection remain unsettled. We test two routes: visible user authorship versus covert language-style mimicry. In a preregistered 3x2 experiment (N = 162), we manipulated user-controlled avatar generation (none, premade, user-generated) and Language Style Matching (LSM) (static vs. adaptive). Generating an avatar boosted rapport ($Ï^2$ = .040, p = .013), whereas adaptive LSM underperformed static style on personalization and satisfaction (d = 0.35, p = .009) and was paradoxically judged less adaptive (t = 3.07, p = .003, d = 0.48). We term this an Adaptation Paradox: synchrony erodes connection when perceived as incoherent, destabilizing persona. To explain, we propose a stability-and-legibility account: visible authorship fosters natural interaction, while covert mimicry risks incoherence. Our findings suggest designers should prioritize legible, user-driven personalization and limit stylistic shifts rather than rely on opaque mimicry.
Authors:Helene Tenzer, Oumnia Abidi, Stefan Feuerriegel
Abstract:
Large language models (LLMs) are increasingly used in everyday communication, including multilingual interactions across different cultural contexts. While LLMs can now generate near-perfect literal translations, it remains unclear whether LLMs support culturally appropriate communication. In this paper, we analyze the cultural sensitivity of different LLM designs when applied to English-Japanese translations of workplace e-mails. Here, we vary the prompting strategies: (1) naive "just translate" prompts, (2) audience-targeted prompts specifying the recipient's cultural background, and (3) instructional prompts with explicit guidance on Japanese communication norms. Using a mixed-methods study, we then analyze culture-specific language patterns to evaluate how well translations adapt to cultural norms. Further, we examine the appropriateness of the tone of the translations as perceived by native speakers. We find that culturally-tailored prompting can improve cultural fit, based on which we offer recommendations for designing culturally inclusive LLMs in multilingual settings.
Authors:Peterson Jean, Emma Murphy, Enda Bates
Abstract:
As the ageing population grows, older adults increasingly rely on wearable devices to monitor chronic conditions. However, conventional health data representations (HDRs) often present accessibility challenges, particularly for critical health parameters like blood pressure and sleep data. This study explores how older adults interact with these representations, identifying key barriers such as semantic inconsistency and difficulties in understanding. While research has primarily focused on data collection, less attention has been given to how information is output and understood by end-users. To address this, an end-user evaluation was conducted with 16 older adults (65+) in a structured workshop, using think-aloud protocols and participatory design activities. The findings highlight the importance of affordance and familiarity in improving accessibility, emphasising the familiarity and potential of multimodal cues. This study bridges the gap between domain experts and end-users, providing a replicable methodological approach for designing intuitive, multisensory HDRs that better align with older adults' needs and abilities.
Authors:Francesco Febbraio, Simona Collina, Christina Lepida, Panagiotis Kourtesis
Abstract:
Colour is a fundamental determinant of affective experience in immersive virtual reality (VR), yet the emotional and physiological impact of individual hues remains poorly characterised. This study investigated how fifteen calibrated Munsell hues influence subjective and autonomic responses when presented in immersive VR. Thirty-six adults (18-45 years) viewed each hue in a within-subject design while pupil diameter and skin conductance were recorded continuously, and self-reported emotions were assessed using the Self-Assessment Manikin across pleasure, arousal, and dominance. Repeated-measures ANOVAs revealed robust hue effects on all three self-report dimensions and on pupil dilation, with medium to large effect sizes. Reds and red-purple hues elicited the highest arousal and dominance, whereas blue-green hues were rated most pleasurable. Pupil dilation closely tracked arousal ratings, while skin conductance showed no reliable hue differentiation, likely due to the brief (30 s) exposures. Individual differences in cognitive style and personality modulated overall reactivity but did not alter the relative ranking of hues. Taken together, these findings provide the first systematic hue-by-hue mapping of affective and physiological responses in immersive VR. They demonstrate that calibrated colour shapes both experience and ocular physiology, while also offering practical guidance for educational, clinical, and interface design in virtual environments.
Authors:Lingyun Chen, Qing Xiao, Zitao Zhang, Eli Blevis, Selma Å abanoviÄ
Abstract:
Many current robot designs prioritize efficiency and one-size-fits-all solutions, oftentimes overlooking personalization, adaptability, and sustainability. To explore alternatives, we conducted two co-design workshops with 23 participants, who engaged with a modular robot co-design framework. Using components we provided as building blocks, participants combined, removed, and invented modules to envision how modular robots could accompany them from childhood through adulthood and into older adulthood. The participants' designs illustrate how modularity (a) enables personalization through open-ended configuration, (b) adaptability across shifting life-stage needs, and (c) sustainability through repair, reuse, and continuity. We therefore derive design principles that establish modularity as a foundation for lifespan-oriented human-robot interaction. This work reframes modular robotics as a flexible and expressive co-design approach, supporting robots that evolve with people, rather than static products optimized for single moments or contexts of use.
Authors:Lin Lin, Ming Wu, Anyu Ren, Zhanwei Wu, Daojun Gong, Ruowei Xiao
Abstract:
In virtual or hybrid co-present events, biodata is emerging as a new paradigm of social cues. While it is able to reveal individuals' inner states, the technology-mediated representation of biodata in social contexts remains underexplored. This study aims to uncover human cognitive preferences and patterns for biodata expression and leverage this knowledge to guide generative AI (GenAI) in creating biodata representations for co-present experiences, aligning with the broader concept of Human-in-the-loop. We conducted a user elicitation workshop with 30 HCI experts and investigated the results using qualitative analysis. Based on our findings, we further propose a GenAI-driven framework: BioMetaphor. Our framework demonstration shows that current GenAI can learn and express visual biodata cues in an event-adpated, human-like manner. This human-centered approach engages users in research, revealing the underlying cognition constructions for biodata expression while demonstrating how such knowledge can inform the design and development of future empathic technologies.
Authors:Jaeung Franciskus Yoo, Huy Kang Kim
Abstract:
Speedrun, a practice of completing a game as quickly as possible, has fostered vibrant communities driven by creativity, competition, and mastery of game mechanics and motor skills. However, this contest also attracts malicious actors as financial incentives come into play. As media and software manipulation techniques advance - such as spliced footage, modified game software and live stream with staged setups - forged speedruns have become increasingly difficult to detect. Volunteer-driven communities invest significant effort to verify submissions, yet the process remains slow, inconsistent, and reliant on informal expertise. In high-profile cases, fraudulent runs have gone undetected for years, allowing perpetrators to gain fame and financial benefits through monetised viewership, sponsorships, donations, and community bounties. To address this gap, we propose Tracer, Tamper Recognition via Analysis of Continuity and Events in game Runs, a modular framework for identifying artefacts of manipulation in speedrun submissions. Tracer provides structured guidelines across audiovisual, physical, and cyberspace dimensions, systematically documenting dispersed in-game knowledge and previously reported fraudulent cases to enhance verification efficiency.
Authors:Boris Kovalerchuk, Brent D. Fegley
Abstract:
Difficult decision-making problems abound in various disciplines and domains. The proliferation of generative techniques, especially large language models (LLMs), has excited interest in using them for decision support. However, LLMs cannot yet resolve missingness in their training data, leading to hallucinations. Retrieval-Augmented Generation (RAG) enhances LLMs by incorporating external information retrieval, reducing hallucinations and improving accuracy. Yet, RAG and related methods are only partial solutions, as they may lack access to all necessary sources or key missing information. Even everyday issues often challenge LLMs' abilities. Submitting longer prompts with context and examples is one approach to address knowledge gaps, but designing effective prompts is non-trivial and may not capture complex mental models of domain experts. For tasks with missing critical information, LLMs are insufficient, as are many existing systems poorly represented in available documents. This paper explores how LLMs can make decision-making more efficient, using a running example of evaluating whether to respond to a call for proposals. We propose a technology based on optimized human-machine dialogue and monotone Boolean and k-valued functions to discover a computationally tractable personal expert mental model (EMM) of decision-making. Our EMM algorithm for LLM prompt engineering has four steps: (1) factor identification, (2) hierarchical structuring of factors, (3) generating a generalized expert mental model specification, and (4) generating a detailed generalized expert mental model from that specification.
Authors:Zhi Hua Jin, Kurt Xiao, David Hyde
Abstract:
In this paper, we develop a virtual laboratory for measuring human trust. Our laboratory, which is realized as a web application, enables researchers to show pre-recorded or live video feeds to groups of users in a synchronized fashion. Users are able to provide real-time feedback on these videos via affect buttons and a freeform chat interface. We evaluate our application via a quantitative user study ($N \approx 80$) involving videos of cyber-physical systems, such as autonomous vehicles, performing positively or negatively. Using data collected from user responses in the application, as well as customized survey instruments assessing different facets of trust, we find that human trust in cyber-physical systems can be affected merely by remotely observing the behavior of such systems, without ever encountering them in person.
Authors:Erik Bertram, Nina Hollender, Sebastian Juhl, Sandra Loop, Martin Schrepp
Abstract:
Converting customer survey feedback data into usable insights has always been a great challenge for large software enterprises. Despite the improvements on this field, a major obstacle often remains when drawing the right conclusions out of the data and channeling them into the software development process. In this paper we present a practical end-to-end approach of how to extract useful information out of a data set and leverage the information to drive change. We describe how to choose the right metrics to measure, gather appropriate feedback from customer end-users, analyze the data by leveraging methods from inferential statistics, make the data transparent, and finally drive change with the results. Furthermore, we present an example of a UX prototype dashboard that can be used to communicate the analyses to stakeholders within the company.
Authors:Clara Toussaint, Benjamin Chateau, Pierre-Guillaume Gourio-Jewell, Emilie Bonnefoy, Nicolas Louveton
Abstract:
Authentication is the cornerstone of information security in our daily lives. However, disabled users such as Blind and Low-Vision (BLV) ones are left behind in digital services due to the lack of accessibility. According to the World Health Organization, 36 million people are blind worldwide. It is estimated that there will be 115 million by 2050, due to the ageing of the population. Yet accessing digital services has become increasingly essential. At the same time, cyber threats targeting individuals have also increased strongly in the last few years. The ALIAS project addresses the need for accessible digital authentication solutions for BLV users facing challenges with digital technology. Security systems can inhibit access for these individuals as they become more complex. This project aims to create a barrier-free authentication system based on cognitive ergonomics and user experience (UX) design methods specifically for BLV users. This paper presents an overview of current research in this area. We also identify research gaps, and finally, we present our project's methodology and approach. First, we will build a knowledge base on the digital practices and cognitive models of BLV users during authentication. This information will support the development of prototypes, which will be tested and refined through two iterations before finalizing the operational version.
Authors:Soobin Cho, Mark Zachry, David W. McDonald
Abstract:
Online spaces involve diverse communities engaging in various forms of collaboration, which naturally give rise to discussions, some of which inevitably escalate into conflict or disputes. To address such situations, AI has primarily been used for moderation. While moderation systems are important because they help maintain order, common moderation strategies of removing or suppressing content and users rarely address the underlying disagreements or the substantive content of disputes. Mediation, by contrast, fosters understanding, reduces emotional tension, and facilitates consensus through guided negotiation. Mediation not only enhances the quality of collaborative decisions but also strengthens relationships among group members. For this reason, we argue for shifting focus toward AI-supported mediation. In this work, we propose an information-focused framework for AI-supported mediation designed for community-based collaboration. Within this framework, we hypothesize that AI must acquire and reason over three key types of information: content, culture, and people.
Authors:Tawfiq Ammari, Zarah Khondoker, Yihan Wang, Nikki Roda
Abstract:
Men experiencing infertility face unique challenges navigating Traditional Masculinity Ideologies that discourage emotional expression and help-seeking. This study examines how Reddit's r/maleinfertility community helps overcome these barriers through digital support networks. Using topic modeling (115 topics), network analysis (11 micro-communities), and time-lagged regression on 11,095 posts and 79,503 comments from 8,644 users, we found the community functions as a hybrid space: informal diagnostic hub, therapeutic commons, and governed institution. Medical advice dominates discourse (63.3\%), while emotional support (7.4\%) and moderation (29.2\%) create essential infrastructure. Sustained engagement correlates with actionable guidance and affiliation language, not emotional processing. Network analysis revealed structurally cohesive but topically diverse clusters without echo chamber characteristics. Cross-posters (20\% of users) who bridge r/maleinfertility and the gender-mixed r/infertility community serve as navigators and mentors, transferring knowledge between spaces. These findings inform trauma-informed design for stigmatized health communities, highlighting role-aware systems and navigation support.
Authors:Hasibur Rahman, Smit Desai
Abstract:
Large language models (LLMs) enable conversational agents (CAs) to express distinctive personalities, raising new questions about how such designs shape user perceptions. This study investigates how personality expression levels and user-agent personality alignment influence perceptions in goal-oriented tasks. In a between-subjects experiment (N=150), participants completed travel planning with CAs exhibiting low, medium, or high expression across the Big Five traits, controlled via our novel Trait Modulation Keys framework. Results revealed an inverted-U relationship: medium expression produced the most positive evaluations across Intelligence, Enjoyment, Anthropomorphism, Intention to Adopt, Trust, and Likeability, significantly outperforming both extremes. Personality alignment further enhanced outcomes, with Extraversion and Emotional Stability emerging as the most influential traits. Cluster analysis identified three distinct compatibility profiles, with "Well-Aligned" users reporting substantially positive perceptions. These findings demonstrate that personality expression and strategic trait alignment constitute optimal design targets for CA personality, offering design implications as LLM-based CAs become increasingly prevalent.
Authors:Yixuan Gao, Tanvir Ahmed, Shuang He, Zhongqi Cheng, Rajalakshmi Nandakumar
Abstract:
Soil moisture monitoring is essential for agriculture and environmental management, yet existing methods require either invasive probes disturbing the soil or specialized equipment, limiting access to the public. We present SoilSound, an ubiquitous accessible smartphone-based acoustic sensing system that can measure soil moisture without disturbing the soil. We leverage the built-in speaker and microphone to perform a vertical scan mechanism to accurately measure moisture without any calibration. Unlike existing work that use transmissive properties, we propose an alternate model for acoustic reflections in soil based on the surface roughness effect to enable moisture sensing without disturbing the soil. The system works by sending acoustic chirps towards the soil and recording the reflections during a vertical scan, which are then processed and fed to a convolutional neural network for on-device soil moisture estimation with negligible computational, memory, or power overhead. We evaluated the system by training with curated soils in boxes in the lab and testing in the outdoor fields and show that SoilSound achieves a mean absolute error (MAE) of 2.39% across 10 different locations. Overall, the evaluation shows that SoilSound can accurately track soil moisture levels ranging from 15.9% to 34.0% across multiple soil types, environments, and users; without requiring any calibration or disturbing the soil, enabling widespread moisture monitoring for home gardeners, urban farmers, citizen scientists, and agricultural communities in resource-limited settings.
Authors:Yuan He, Brendan Rooney, Rachel McDonnell
Abstract:
Mixed Reality (MR) presents novel opportunities to investigate how individuals perceive themselves and others during shared, augmented experiences within a common physical environment. Previous research has demonstrated that users can embody avatars in MR, temporarily extending their sense of self. However, there has been limited exploration of body-swapping, a condition in which two individuals simultaneously inhabit each other's avatars, and its potential effects on social interaction in immersive environments. To address this gap, we adapted the Joint Simon Task (JST), a well-established implicit paradigm, to examine how body-swapping influences the cognitive and perceptual boundaries between self and other. Our results indicate that body-swapping led participants to experience themselves and their partner as functioning like a single, unified system, as in two bodies operating as one agent. This suggests possible cognitive and perceptual changes that go beyond simple collaboration. Our findings have significant implications for the design of MR systems intended to support collaboration, empathy, social learning, and therapeutic interventions through shared embodiment.
Authors:Ninad Bhat, Kieran Browne, Pip Bingemann
Abstract:
We introduce Creativity Benchmark, an evaluation framework for large language models (LLMs) in marketing creativity. The benchmark covers 100 brands (12 categories) and three prompt types (Insights, Ideas, Wild Ideas). Human pairwise preferences from 678 practising creatives over 11,012 anonymised comparisons, analysed with Bradley-Terry models, show tightly clustered performance with no model dominating across brands or prompt types: the top-bottom spread is $Îθ\approx 0.45$, which implies a head-to-head win probability of $0.61$; the highest-rated model beats the lowest only about $61\%$ of the time. We also analyse model diversity using cosine distances to capture intra- and inter-model variation and sensitivity to prompt reframing. Comparing three LLM-as-judge setups with human rankings reveals weak, inconsistent correlations and judge-specific biases, underscoring that automated judges cannot substitute for human evaluation. Conventional creativity tests also transfer only partially to brand-constrained tasks. Overall, the results highlight the need for expert human evaluation and diversity-aware workflows.
Authors:Brittany Harbison, Samuel Taubman, Travis Taylor, Ashok. K. Goel
Abstract:
Social connection is a vital part of learning, yet online course environments present barriers to the organic formation of social groups. SAMI offers one solution by facilitating student connections, but its effectiveness is constrained by an incomplete Theory of Mind, limiting its ability to create an effective mental model of a student. One facet of this is its inability to intuit personality, which may influence the relevance of its recommendations. To explore this, we propose a personality detection model utilizing GPTs zero-shot capability to infer Big-Five personality traits from forum introduction posts, often encouraged in online courses. We benchmark its performance against established models, demonstrating its efficacy in this task. Furthermore, we integrate this model into SAMIs entity-based matchmaking system, enabling personality-informed social recommendations. Initial integration suggests personality traits can complement existing matching factors, though additional evaluation is required to determine their full impact on student engagement and match quality.
Authors:Muhannad Ismael, Maël Cornil
Abstract:
This paper presents an outdoor tracking system using Real-Time Kinematic (RTK) positioning and Optical See-Through Head Mounted Display(s) (OST-HMD(s)) in urban areas where the accurate tracking of objects is critical and where displaying occluded information is important for safety reasons. The approach presented here replaces 2D screens/tablets and offers distinct advantages, particularly in scenarios demanding hands-free operation. The integration of RTK, which provides centimeter-level accuracy of tracked objects, with OST-HMD represents a promising solution for outdoor applications. This paper provides valuable insights into leveraging the combined potential of RTK and OST-HMD for outdoor tracking tasks from the perspectives of systems integration, performance optimization, and usability. The main contributions of this paper are: \textbf{1)} a system for seamlessly merging RTK systems with OST-HMD to enable relatively precise and intuitive outdoor tracking, \textbf{2)} an approach to determine a global location to achieve the position relative to the world, \textbf{3)} an approach referred to as 'semi-dynamic' for system assessment. Moreover, we offer insights into several relevant future research topics aimed at improving the OST-HMD and RTK hybrid system for outdoor tracking.
Authors:Stefan Resch, André Kousha, Anna Carroll, Noah Severinghaus, Felix Rehberg, Marco Zatschker, Yunus Söyleyici, Daniel Sanchez-Morillo
Abstract:
Smart assistive technologies such as sensor-based footwear and walking aids offer promising opportunities to support rehabilitation through real-time feedback and patient-centered monitoring. However, most orthotic devices remain passive and lack integrated sensing or feedback functionalities, while existing research often focuses on isolated prototypes rather than cohesive, interactive systems. In this work, we present the design and implementation of a novel modular sensor system that combines a smart foot orthosis with an instrumented forearm crutch. The system integrates plantar pressure and motion sensing, vibrotactile feedback, and wireless communication via a smartphone application. We conducted an experimental user study with eight participants to validate the feasibility of the smart foot orthosis for mobile gait detection, explore the potential of haptic feedback for user interaction, and assess the usability of the accompanying mobile health application. Our work contributes to the field of smart assistive technology in rehabilitation and prevention by demonstrating a functional and comprehensive system. We further discuss system limitations, outline potential application scenarios, and provide recommendations for future development and clinical integration.
Authors:Dana Harari, Ofra Amir
Abstract:
Artificial intelligence (AI) assistants are increasingly embedded in workplace tools, raising the question of how initiative-taking shapes adoption. Prior work highlights trust and expectation mismatches as barriers, but the underlying psychological mechanisms remain unclear. Drawing on self-affirmation and social exchange theories, we theorize that unsolicited help elicits self-threat, reducing willingness to accept assistance, likelihood of future use, and performance expectancy. We report two vignette-based experiments (Study~1: $N=761$; Study~2: $N=571$, preregistered). Study~1 compared anticipatory and reactive help provided by an AI vs. a human, while Study~2 distinguished between \emph{offering} (suggesting help) and \emph{providing} (acting automatically). In Study 1, AI help was more threatening than human help. Across both studies, anticipatory help increased perceived threat and reduced adoption outcomes. Our findings identify self-threat as a mechanism explaining why proactive AI features may backfire and suggest design implications for AI initiative.
Authors:Akira Matsui, Kazuki Fujikawa, Ryo Sasaki, Ryo Adachi
Abstract:
Live streaming platforms offer a distinctive way for users and content creators to interact with each other through real-time communication. While research on user behavior in online platforms has explored how users discover their favorite content from creators and engage with them, the role of real-time features remains unclear. There are open questions as to what commonalities and differences exist in users' relationships with live streaming platforms compared to traditional on-demand style platforms. To understand this, we employ the concept of Exploration/Exploitation (E/E) and analyze a large-scale dataset from a live streaming platform over two years. Our results indicate that even on live streaming platforms, users exhibit E/E behavior but experience a longer exploration period. We also identify external factors, such as circadian rhythms, that influence E/E dynamics and user loyalty. The presented study emphasizes the importance of balancing E/E in online platform design, especially for live streaming platforms, providing implications that suggest design strategies for platform developers and content creators to facilitate timely engagement and retention.
Authors:Astrid van den Brandt, Sehi L'Yi, Huyen N. Nguyen, Anna Vilanova, Nils Gehlenborg
Abstract:
Multimodal interaction has been increasingly considered in designing visualization authoring tools. However, multimodal interaction has a broad meaning in visualization authoring, according to our literature review. Although some previous studies compare different authoring tools, a comprehensive overview of the diverse characteristics of multimodal interaction in visualization authoring tools is still missing. This paper seeks to offer a systematic perspective on how multimodal interaction is integrated within visualization authoring tools. Such an overview can enhance understanding of current practices, highlight distinguishing features among tools, and help identify future research directions, guiding designers in developing more accessible and effective authoring systems. We review 20 visualization authoring tools that incorporate multimodal interaction and characterize how multimodal interaction is applied in these tools. Based on the review results, we discuss design implications and future directions.
Authors:Duke Lin, Michael Paskett, Ying Yang
Abstract:
In human-computer interaction applications like hand gesture recognition, supervised learning models are often trained on a large population of users to achieve high task accuracy. However, due to individual variability in sensor signals and user behavior, static models may not provide optimal performance for all users. Personalizing pretrained models via calibration--collecting labeled data from each user--can improve performance but introduces user friction and struggles with limited data. To overcome these issues, we propose a calibrationless longitudinal personalization method: a contextual multi-arm bandit (MAB) algorithm combined with a pretrained neural network for gesture recognition. This reinforcement-learning-style approach enables personalization using binary reward signals, either user-provided or inferred by the system. We validated this method in a user study. Participants wore a surface electromyography (sEMG) device and played multiple rounds of a 2-D navigation game using six hand gestures. In the session, they completed a baseline round and then a round with our algorithm; in the second session, they played another round with our algorithm. Our approach led to a significant reduction in users' average false negative rate by 0.113 from the initial to the final round, with further decreases between sessions. Average precision also trended upward (by 0.139) from the start to end of a round, continuing in the next session. Notably, some users who could not complete the game with the baseline model succeeded with our contextual MAB model. In summary, our
Authors:Marcelino Garcia, Renato Garcia, Arthur Parizotto, Andre Mendes, Pedro Valle, Ricardo Vilela, Renato Balancieri, Williamson Silva
Abstract:
Educational chatbots have gained prominence as support tools for teaching programming, particularly in introductory learning contexts. This paper presents a Systematic Mapping Study (SMS) that investigated how such agents have been developed and applied in programming education. From an initial set of 3,216 publications, 54 studies were selected and analyzed based on five research subquestions, addressing chatbot types, programming languages used, educational content covered, interaction models, and application contexts. The results reveal a predominance of chatbots designed for Python instruction, focusing on fundamental programming concepts, and employing a wide variety of pedagogical approaches and technological architectures. In addition to identifying trends and gaps in the literature, this study provides insights to inform the development of new educational tools for programming instruction.
Authors:Emelia May Hughes, Tim Weninger
Abstract:
This study examines the structural dynamics of Truth Social, a politically aligned social media platform, during two major political events: the U.S. Supreme Court's overturning of Roe v. Wade and the FBI's search of Mar-a-Lago. Using a large-scale dataset of user interactions based on re-truths (platform-native reposts), we analyze how the network evolves in relation to fragmentation, polarization, and user influence. Our findings reveal a segmented and ideologically homogenous structure dominated by a small number of central figures. Political events prompt temporary consolidation around shared narratives, followed by rapid returns to fragmented, echo-chambered clusters. Centrality metrics highlight the disproportionate role of key influencers, particularly @realDonaldTrump, in shaping visibility and directing discourse. These results contribute to research on alternative platforms, political communication, and online network behavior, demonstrating how infrastructure and community dynamics together reinforce ideological boundaries and limit cross-cutting engagement.
Authors:Meihe Xu, Aurelia Tamò-Larrieux, Arianna Rossi
Abstract:
Individuals increasingly face an overwhelming number of tasks and decisions. To cope with the new reality, there is growing research interest in developing intelligent agents that can effectively assist people across various aspects of daily life in a tailored manner, with privacy emerging as a particular area of application. Artificial intelligence (AI) assistants for privacy, such as personalized privacy assistants (PPAs), have the potential to automatically execute privacy decisions based on users' pre-defined privacy preferences, sparing them the mental effort and time usually spent on each privacy decision. This helps ensure that, even when users feel overwhelmed or resigned about privacy, the decisions made by PPAs still align with their true preferences and best interests. While research has explored possible designs of such agents, user and expert perspectives on the acceptability of such AI-driven solutions remain largely unexplored. In this study, we conducted five focus groups with domain experts (n = 11) and potential users (n = 26) to uncover key themes shaping the acceptance of PPAs. Factors influencing the acceptability of AI assistants for privacy include design elements (such as information sources used by the agent), external conditions (such as regulation and literacy education), and systemic conditions (e.g., public or market providers and the need to avoid monopoly) to PPAs. These findings provide theoretical extensions to technology acceptance models measuring PPAs, insights on design, and policy implications for PPAs, as well as broader implications for the design of AI assistants.
Authors:Rhiannon Owen, Jonathan C. Roberts
Abstract:
Dementia care requires healthcare professionals to balance a patient's medical needs with a deep understanding of their personal needs, preferences, and emotional cues. However, current digital tools prioritise quantitative metrics over empathetic engagement,limiting caregivers ability to develop a deeper personal understanding of their patients. This paper presents an empathy centred visualisation framework, developed through a design study, to address this gap. The framework integrates established principles of person centred care with empathy mapping methodologies to encourage deeper engagement. Our methodology provides a structured approach to designing for indirect end users, patients whose experience is shaped by a tool they may not directly interact with. To validate the framework, we conducted evaluations with healthcare professinals, including usability testing of a working prototype and a User Experience Questionnaire study. Results suggest the feasibility of the framework, with participants highlighting its potential to support a more personal and empathetic relationship between medical staff and patients. The work starts to explore how empathy could be systematically embedded into visualisation design, as we contribute to ongoing efforts in the data visualisation community to support human centred, interpretable, and ethically aligned clinical care, addressing the urgent need to improve dementia patients experiences in hospital settings.
Authors:Alina Tausch, Magdalena Wischnewski, Mustafa Yalciner, Daniel Neider
Abstract:
This online-vignette study investigates the impact of certification and verification as measures for quality assurance of AI on trust and use of a robo-advisor. Confronting 520 participants with an imaginary situation where they were using an online banking service to invest their inherited money, we formed 4 experimental groups. EG1 achieved no further information of their robo-advisor, while EG2 was informed that their robo-advisor was certified by a reliable agency for unbiased processes, and EG3 was presented with a formally verified robo-advisor that was proven to consider their investment preferences. A control group was presented a remote certified human financial advisor. All groups had to decide on how much of their 10,000 euros they would give to their advisor to autonomously invest for them and report on trust and perceived dependability. A second manipulation happened afterwards, confronting participants with either a successful or failed investment. Overall, our results show that the level of quality assurance of the advisor had surprisingly near to no effect of any of our outcome variables, except for people's perception of their own mental model of the advisor. Descriptively, differences between investments show that seem to favor a verified advisor with a median investment of 65,000 euros (vs. 50,000). Success or failure information, though influences only partially by advisor quality, has been perceived as a more important clue for advisor trustworthiness, leading to substantially different trust and dependability ratings. The study shows the importance of thoroughly investigating not only trust, but also trusting behavior with objective measures. It also underlines the need for future research on formal verification, that might be the gold standard in proving AI mathematically, but seems not to take full effect as a cue for trustworthiness for end-users.
Authors:Benjamin Sturgeon, Daniel Samuelson, Jacob Haimes, Jacy Reese Anthis
Abstract:
As humans delegate more tasks and decisions to artificial intelligence (AI), we risk losing control of our individual and collective futures. Relatively simple algorithmic systems already steer human decision-making, such as social media feed algorithms that lead people to unintentionally and absent-mindedly scroll through engagement-optimized content. In this paper, we develop the idea of human agency by integrating philosophical and scientific theories of agency with AI-assisted evaluation methods: using large language models (LLMs) to simulate and validate user queries and to evaluate AI responses. We develop HumanAgencyBench (HAB), a scalable and adaptive benchmark with six dimensions of human agency based on typical AI use cases. HAB measures the tendency of an AI assistant or agent to Ask Clarifying Questions, Avoid Value Manipulation, Correct Misinformation, Defer Important Decisions, Encourage Learning, and Maintain Social Boundaries. We find low-to-moderate agency support in contemporary LLM-based assistants and substantial variation across system developers and dimensions. For example, while Anthropic LLMs most support human agency overall, they are the least supportive LLMs in terms of Avoid Value Manipulation. Agency support does not appear to consistently result from increasing LLM capabilities or instruction-following behavior (e.g., RLHF), and we encourage a shift towards more robust safety and alignment targets.
Authors:Oliver Child, Ollie Hanton, Jack Dawson, Steve Hodges, Mike Fraser
Abstract:
Consumer-level multi-material 3D printing with conductive thermoplastics enables fabrication of interactive elements for bespoke tangible devices. However, large feature sizes, high resistance materials, and limitations of printable control circuitry mean that deployable devices cannot be printed without post-print assembly steps. To address these challenges, we present Printegrated Circuits, a technique that uses traditional electronics as material to 3D print self-contained interactive objects. Embedded PCBs are placed into recesses during a pause in the print, and through a process we term \textit{Prinjection}, conductive filament is injected into their plated-through holes. This automatically creates reliable electrical and mechanical contact, eliminating the need for manual wiring or bespoke connectors. We describe the custom machine code generation that supports our approach, and characterise its electrical and mechanical properties. With our 6 demonstrations, we highlight how the Printegrated Circuits process fits into existing design and prototyping workflows as well as informs future research agendas.
Authors:Michael Correll, Lane Harrison
Abstract:
In this provocation, we suggest that much (although not all) current uncertainty visualization simplifies the myriad forms of uncertainty into error bars around an estimate. This apparent simplification into error bars comes only as a result of a vast metaphysics around uncertainty and probability underlying modern statistics. We use examples from religion to present alternative views of uncertainty (metaphysical or otherwise) with the goal of enriching our conception of what kind of uncertainties we ought to visualize, and what kinds of people we might be visualizing those uncertainties for.
Authors:Yulin Yu, Houming Chen, Daniel Romero, Paramveer S. Dhillon
Abstract:
Social media platforms offer users multiple ways to engage with content--likes, retweets, and comments--creating a complex signaling system within the attention economy. While previous research has examined factors driving overall engagement, less is known about why certain tweets receive unexpectedly high levels of one type of engagement relative to others. Drawing on Signaling Theory and Attention Economy Theory, we investigate these unexpected engagement patterns on Twitter (now known as "X"), developing an "unexpectedness quotient" to quantify deviations from predicted engagement levels. Our analysis of over 600,000 tweets reveals distinct patterns in how content characteristics influence unexpected engagement. News, politics, and business tweets receive more retweets and comments than expected, suggesting users prioritize sharing and discussing informational content. In contrast, games and sports-related topics garner unexpected likes and comments, indicating higher emotional investment in these domains. The relationship between content attributes and engagement types follows clear patterns: subjective tweets attract more likes while objective tweets receive more retweets, and longer, complex tweets with URLs unexpectedly receive more retweets. These findings demonstrate how users employ different engagement types as signals of varying strength based on content characteristics, and how certain content types more effectively compete for attention in the social media ecosystem. Our results offer valuable insights for content creators optimizing engagement strategies, platform designers facilitating meaningful interactions, and researchers studying online social behavior.
Authors:James M. Berzuk, Lauren Corcoran, Brannen McKenzie-Lefurgey, Katie Szilagyi, James E. Young
Abstract:
Contemporary robots are increasingly mimicking human social behaviours to facilitate interaction, such as smiling to signal approachability, or hesitating before taking an action to allow people time to react. Such techniques can activate a person's entrenched social instincts, triggering emotional responses as though they are interacting with a fellow human, and can prompt them to treat a robot as if it truly possesses the underlying life-like processes it outwardly presents, raising significant ethical questions. We engage these issues through the lens of informed consent: drawing upon prevailing legal principles and ethics, we examine how social robots can influence user behaviour in novel ways, and whether under those circumstances users can be appropriately informed to consent to these heightened interactions. We explore the complex circumstances of human-robot interaction and highlight how it differs from more familiar interaction contexts, and we apply legal principles relating to informed consent to social robots in order to reconceptualize the current ethical debates surrounding the field. From this investigation, we synthesize design goals for robot developers to achieve more ethical and informed human-robot interaction.
Authors:Hieu Tran, Go-Eum Cha, Sooyeon Jeong
Abstract:
As social robots get more deeply integrated intoour everyday lives, they will be expected to engage in meaningful conversations and exhibit socio-emotionally intelligent listening behaviors when interacting with people. Active listening and backchanneling could be one way to enhance robots' communicative capabilities and enhance their effectiveness in eliciting deeper self-disclosure, providing a sense of empathy,and forming positive rapport and relationships with people.Thus, we developed an LLM-powered social robot that can exhibit contextually appropriate sentiment-based backchannelingand active listening behaviors (active listening+backchanneling) and compared its efficacy in eliciting people's self-disclosurein comparison to robots that do not exhibit any of these listening behaviors (control) and a robot that only exhibitsbackchanneling behavior (backchanneling-only). Through ourexperimental study with sixty-five participants, we found theparticipants who conversed with the active listening robot per-ceived the interactions more positively, in which they exhibited the highest self-disclosures, and reported the strongest senseof being listened to. The results of our study suggest that the implementation of active listening behaviors in social robotshas the potential to improve human-robot communication andcould further contribute to the building of deeper human-robot relationships and rapport.
Authors:Kyle Coutray, Wanyea Barbel, Zack Groth, Joseph J LaViola
Abstract:
Brain-Computer Interfaces (BCIs) have traditionally been studied in clinical and laboratory contexts, but the rise of consumer-grade devices now allows exploration of their use in daily activities. Virtual reality (VR) provides a particularly relevant domain, where existing input methods often force trade-offs between speed, accuracy, and physical effort. This study introduces NeuroGaze, a hybrid interface combining electroencephalography (EEG) with eye tracking to enable hands-free interaction in immersive VR. Twenty participants completed a 360° cube-selection task using three different input methods: VR controllers, gaze combined with a pinch gesture, and NeuroGaze. Performance was measured by task completion time and error rate, while workload was evaluated using the NASA Task Load Index (NASA-TLX). NeuroGaze successfully supported target selection with off-the-shelf hardware, producing fewer errors than the alternative methods but requiring longer completion times, reflecting a classic speed-accuracy tradeoff. Workload analysis indicated reduced physical demand for NeuroGaze compared to controllers, though overall ratings and user preferences were mixed. These findings demonstrate the feasibility of hybrid EEG+gaze systems for everyday VR use, highlighting their ergonomic benefits and inclusivity potential. Although not yet competitive in speed, NeuroGaze points toward a practical role for consumer-grade BCIs in accessibility and long-duration applications, and underscores the need for improved EEG signal processing and adaptive multimodal integration to enhance future performance.
Authors:Pierre Lambert, Edouard Couplet, Michel Verleysen, John Aldo Lee
Abstract:
Neighbour embeddings (NE) allow the representation of high dimensional datasets into lower dimensional spaces and are often used in data visualisation. In practice, accelerated approximations are employed to handle very large datasets. Accelerating NE is challenging, and two main directions have been explored: very coarse approximations based on negative sampling (as in UMAP) achieve high effective speed but may lack quality in the extracted structures; less coarse approximations, as used in FIt-SNE or BH-t-SNE, offer better structure preservation at the cost of speed, while also restricting the target dimensionality to 2 or 3, limiting NE to visualisation. In some variants, the precision of these costlier accelerations also enables finer-grained control on the extracted structures through dedicated hyperparameters. This paper proposes to bridge the gab between both approaches by introducing a novel way to accelerate NE, requiring a small number of computations per iteration while maintaining good fine-grained structure preservation and flexibility through hyperparameter tuning, without limiting the dimensionality of the embedding space. The method was designed for interactive exploration of data; as such, it abandons the traditional two-phased approach of other NE methods, allowing instantaneous visual feedback when changing hyperparameters, even when these control processes happening on the high-dimensional side of the computations. Experiments using a publicly available, GPU accelerated GUI integration of the method show promising results in terms of speed, flexibility in the structures getting extracted, and show potential uses in broader machine learning contexts with minimal algorithmic modifications. Central to this algorithm is a novel approach to iterative approximate nearest neighbour search, which shows promising results compared to nearest neighbour descent.
Authors:Tamlin Love, Antonio Andriella, Guillem AlenyÃ
Abstract:
Explainability is a critical tool in helping stakeholders understand robots. In particular, the ability for robots to explain why they have made a particular decision or behaved in a certain way is useful in this regard. Behaviour trees are a popular framework for controlling the decision-making of robots and other software systems, and thus a natural question to ask is whether or not a system driven by a behaviour tree is capable of answering "why" questions. While explainability for behaviour trees has seen some prior attention, no existing methods are capable of generating causal, counterfactual explanations which detail the reasons for robot decisions and behaviour. Therefore, in this work, we introduce a novel approach which automatically generates counterfactual explanations in response to contrastive "why" questions. Our method achieves this by first automatically building a causal model from the structure of the behaviour tree as well as domain knowledge about the state and individual behaviour tree nodes. The resultant causal model is then queried and searched to find a set of diverse counterfactual explanations. We demonstrate that our approach is able to correctly explain the behaviour of a wide range of behaviour tree structures and states. By being able to answer a wide range of causal queries, our approach represents a step towards more transparent, understandable and ultimately trustworthy robotic systems.
Authors:Diana Mihalache, Dalila Szostak
Abstract:
This chapter focuses on the intersection of user experience (UX) and wellbeing in the context of content moderation. Human content moderators play a key role in protecting end users from harm by detecting, evaluating, and addressing content that may violate laws or product policies. They face numerous challenges, including exposure to sensitive content, monotonous tasks, and complex decisions, which are often exacerbated by inadequate tools. This chapter explains the importance of incorporating wellbeing considerations throughout the product development lifecycle, offering a framework and practical strategies for implementation across key UX disciplines: research, writing, and design. By examining these considerations, this chapter provides a roadmap for creating user experiences that support content moderators, benefiting both the user and the business.
Authors:Agam Oberlender, Hadas Erel
Abstract:
It is well established that people perceive robots as social entities, even when they are not designed for social interaction. We evaluated whether the social interpretation of robotic gestures should also be considered when turning off a robot. In the experiment, participants engaged in a brief preliminary neutral interaction while a robotic arm showed interest in their actions. At the end of the task, participants were asked to turn off the robotic arm under two conditions: (1) a Non-designed condition, where all of the robot's engines were immediately and simultaneously turned off, as robots typically shut down; (2) a Designed condition, where the robot's engines gradually folded inward in a motion resembling "falling asleep." Our findings revealed that all participants anthropomorphized the robot's movement when it was turned off. In the Non-designed condition, most participants interpreted the robot's turn-off movement negatively, as if the robot had "died." In the Designed condition, most participants interpreted it more neutrally, stating that the robot "went to sleep." The robot's turn-off movement also impacted its perception, leading to higher likeability, perceived intelligence, and animacy in the Designed condition. We conclude that the impact of common edge interactions, such as turning off a robot, should be carefully designed while considering people's automatic tendency to perceive robots as social entities.
Authors:Jingwen Qin, Semen Checherin, Yue Li, Berend-Jan van der Zwaag, Ozlem Durmaz-Incel
Abstract:
Color Vision Deficiency (CVD) affects nearly 8 percent of men and 0.5 percent of women worldwide. Existing color-correction methods often rely on prior clinical diagnosis and static filtering, making them less effective for users with mild or moderate CVD. In this paper, we introduce Hue4U, a personalized, real-time color-correction system in augmented reality using consumer-grade Meta Quest headsets. Unlike previous methods, Hue4U requires no prior medical diagnosis and adapts to the user in real time. A user study with 10 participants showed notable improvements in their ability to distinguish colors. The results demonstrated large effect sizes (Cohen's d > 1.4), suggesting clinically meaningful gains for individuals with CVD. These findings highlight the potential of personalized AR interventions to improve visual accessibility and quality of life for people affected by CVD.
Authors:Carlos A. Pinheiro de Sousa, Niklas Gröne, Mathias Günther, Oliver Deussen
Abstract:
We introduce a multi-user VR co-location framework that synchronizes users within a shared virtual environment aligned to physical space. Our approach combines a motion capture system with SLAM-based inside-out tracking to deliver smooth, high-framerate, low-latency performance. Previous methods either rely on continuous external tracking, which introduces latency and jitter, or on one-time calibration, which cannot correct drift over time. In contrast, our approach combines the responsiveness of local HMD SLAM tracking with the flexibility to realign to an external source when needed. It also supports real-time pose sharing across devices, ensuring consistent spatial alignment and engagement between users. Our evaluation demonstrates that our framework achieves the spatial accuracy required for natural multi-user interaction while offering improved comfort, scalability, and robustness over existing co-located VR solutions.
Authors:Raphaël Weuts, Johannes Bleher, Hannah Bleher, Rozanne Tuesday Flores, Guo Xuanyang, PaweŠPujszo, Zsolt Almási
Abstract:
As artificial intelligence (AI) systems permeate critical sectors, the need for professionals who can address ethical, legal and governance challenges has become urgent. Current AI ethics education remains fragmented, often siloed by discipline and disconnected from practice. This paper synthesizes literature and regulatory developments to propose a modular, interdisciplinary curriculum that integrates technical foundations with ethics, law and policy. We highlight recurring operational failures in AI - bias, misspecified objectives, generalization errors, misuse and governance breakdowns - and link them to pedagogical strategies for teaching AI governance. Drawing on perspectives from the EU, China and international frameworks, we outline a semester plan that emphasizes integrated ethics, stakeholder engagement and experiential learning. The curriculum aims to prepare students to diagnose risks, navigate regulation and engage diverse stakeholders, fostering adaptive and ethically grounded professionals for responsible AI governance.
Authors:Yuxin Zhang, Fan Zhang, Jinjun Xia, Chao Zhao
Abstract:
This study adopts a design-oriented approach to integrate traditional braids with commonly used matrix materials, developing creative materials with different sensory properties by altering matrix material types and braid patterns. Based on these creative materials, a quantitative and structured model is proposed to assist designers understanding the material experience process and guide material selection by analyzing the relationship between material properties and sensory perception. Specifically, participants evaluated the creative materials under visual-tactile conditions using a 7-point semantic differential (SD) scale. Correlation analysis was performed to explore the data. The main and interaction effects of matrix materials and braid patterns on impression evaluation were analyzed using two-way analysis of variance (ANOVA). A structural equation model (SEM) was constructed based on exploratory factor analysis (EFA), and path coefficients were computed to assess the relative importance of material properties in determining material attractiveness. The results show that, compared to braids, the creative materials resulted in significant changes in impression evaluation. Furthermore, the creative materials can be understood through intrinsic, aesthetic, and physical properties, with their standardized regression coefficients for material attractiveness of 0.486, 0.650, and 0.103, respectively. These properties are interrelated and under their combined influence affect the attractiveness of the material. Therefore, designers should consider utilizing these relationships to enhance sensory experience in order to achieve design objectives. Moreover, designers should also consider balancing technology and experience, using materials according to the principle of "form follows function".
Authors:Omar Elgohary, Zhu-Tien
Abstract:
Each year, multi-modal interaction continues to grow within both industry and academia. However, researchers have yet to fully explore the impact of multi-modal systems on learning and memory retention. This research investigates how combining gaze-based controls with gesture navigation affects information retention when compared to standard track-pad usage. A total of twelve participants read four textual articles through two different user interfaces which included a track-pad and a multi-modal interface that tracked eye movements and hand gestures for scrolling, zooming, and revealing content. Participants underwent two assessment sessions that measured their information retention immediately and after a twenty-four hour period along with the NASA-TLX workload evaluation and the System Usability Scale assessment. The initial analysis indicates that multi-modal interaction produces similar targeted information retention to traditional track-pad usage, but this neutral effect comes with higher cognitive workload demands and seems to deteriorate with long-term retention. The research results provide new knowledge about how multi-modal systems affect cognitive engagement while providing design recommendations for future educational and assistive technologies that require effective memory performance.
Authors:Ruiqi Chen, Qingyang He, Hanxi Bao, Jung Choi, Xin Tong
Abstract:
Graffiti has long documented the socio-cultural landscapes of urban spaces, yet increasing global regulations have constrained artists' creative freedom, prompting exploration of digital alternatives. Augmented Reality (AR) offers opportunities to extend graffiti into digital environments while retaining spatial and cultural significance, but prior research has largely centered on audience engagement rather than the embodied creative processes of graffiti artists. To address this, we developed GestoBrush, a mobile AR prototype that turns smartphones into virtual spray cans, enabling graffiti creation through embodied gestures. A co-design workshop underscored the role of embodiment-physical engagement with surroundings and body-driven creative processes-in digital workflows. We evaluated GestoBrush with six graffiti artists and findings suggested that embodied AR interactions supporting artists bypass real-world constraints and explore new artistic possibilities, whose AR artworks created enhanced senses of intuitiveness, immersion, and expressiveness. This work highlight how embodied AR tools can bridge the gap between physical graffiti practice and digital expression, suggesting pathways for designing immersive creative systems that respect the cultural ethos of street art while expanding its possibilities in virtual spaces.
Authors:Rui Xi, Xianghan Wang
Abstract:
Loneliness and social isolation pose significant emotional and health challenges, prompting the development of technology-based solutions for companionship and emotional support. This paper introduces Livia, an emotion-aware augmented reality (AR) companion app designed to provide personalized emotional support by combining modular artificial intelligence (AI) agents, multimodal affective computing, progressive memory compression, and AR driven embodied interaction. Livia employs a modular AI architecture with specialized agents responsible for emotion analysis, dialogue generation, memory management, and behavioral orchestration, ensuring robust and adaptive interactions. Two novel algorithms-Temporal Binary Compression (TBC) and Dynamic Importance Memory Filter (DIMF)-effectively manage and prioritize long-term memory, significantly reducing storage requirements while retaining critical context. Our multimodal emotion detection approach achieves high accuracy, enhancing proactive and empathetic engagement. User evaluations demonstrated increased emotional bonds, improved satisfaction, and statistically significant reductions in loneliness. Users particularly valued Livia's adaptive personality evolution and realistic AR embodiment. Future research directions include expanding gesture and tactile interactions, supporting multi-user experiences, and exploring customized hardware implementations.
Authors:Sujit Kumar Sikder, Jyotirmaya Ijaradar, Hao Li, Hichem Omrani
Abstract:
Many transport authorities are collecting and publishing almost real-time road traffic data to meet the growing trend of massive open data, a vital resource for foresight decision support systems considering deep data insights. We explored the spatio-temporal transitions in the cross-country road traffic volumes in the context of modelling behavioural transitions in car-based human mobility. This study reports on individual car-based daily travel behaviour detected, before (2018) and during the COVID pandemic (2020), between Germany and neighbouring countries. In the case of Luxembourg, the Bridges and Roads Authority has installed a large digital traffic observatory infrastructure through the adoption of sensor-based IoT technologies, like other European member states. Since 2016, they have provided high-performance data processing and published open data on the country's road traffic. The dataset contains an hourly traffic count for different vehicle types, daily for representative observation points, followed by a major road network. The original dataset contains significant missing entries, so comprehensive data harmonization was performed. We observed the decrease in traffic volumes during pandemic factors (e.g. lockdowns and remote work) period by following global trend of reduced personal mobility. The understanding the dynamic adaptive travel behaviours provide a potential opportunity to generate the actionable insight including temporal and spatial implications. This study demonstrates that the national open traffic data products can have adoption potential to address cross-border insights. In relevance to the net-zero carbon transition, further study should shed light on the interpolation and downscaling approaches at the comprehensive road-network level for identifying pollution hot spots, causal link to functional landuse patterns and calculation of spatial influence area.
Authors:BÅażej Kotowski, Nicholas Evans, Behzad Haki, Frederic Font, Sergi JordÃ
Abstract:
This paper investigates GrooveTransformer, a real-time rhythm generation system, through the postphenomenological framework of Variational Cross-Examination (VCE). By reflecting on its deployment across three distinct artistic contexts, we identify three stabilities: an autonomous drum accompaniment generator, a rhythmic control voltage sequencer in Eurorack format, and a rhythm driver for a harmonic accompaniment system. The versatility of its applications was not an explicit goal from the outset of the project. Thus, we ask: how did this multistability emerge? Through VCE, we identify three key contributors to its emergence: the affordances of system invariants, the interdisciplinary collaboration, and the situated nature of its development. We conclude by reflecting on the viability of VCE as a descriptive and analytical method for Digital Musical Instrument (DMI) design, emphasizing its value in uncovering how technologies mediate, co-shape, and are co-shaped by users and contexts.
Authors:Verena Biener, Florian Jack Winston, Dieter Schmalstieg, Alexander Plopski
Abstract:
Extended Reality (XR) is increasingly used as a productivity tool and recent commercial XR devices have even been specifically designed as productivity tools, or, at least, are heavily advertised for such purposes, such as the Apple Vision Pro (AVP), which has now been available for more than one year. In spite of what marketing suggests, research still lacks an understanding of the long-term usage of such devices in ecologically valid everyday settings, as most studies are conducted in very controlled environments. Therefore, we conducted interviews with ten AVP users to better understand how experienced users engage with the device, and which limitations persist. Our participants report that XR can increase productivity and that they got used to the device after some time. Yet, a range of limitations persist that might hinder the widespread use of XR as a productivity tool, such as a lack of native applications, difficulties when integrating XR into current workflows, and limited possibilities to adapt and customize the XR experience.
Authors:Dominik Pegler, David Steyrl, Mengfan Zhang, Alexander Karner, Jozsef Arato, Frank Scharnowski, Filip Melinscak
Abstract:
Advances in computer vision have opened new avenues for clinical applications, particularly in computerized exposure therapy where visual stimuli can be dynamically adjusted based on patient responses. As a critical step toward such adaptive systems, we investigated whether pretrained computer vision models can accurately predict fear levels from spider-related images. We adapted three diverse models using transfer learning to predict human fear ratings (on a 0-100 scale) from a standardized dataset of 313 images. The models were evaluated using cross-validation, achieving an average mean absolute error (MAE) between 10.1 and 11.0. Our learning curve analysis revealed that reducing the dataset size significantly harmed performance, though further increases yielded no substantial gains. Explainability assessments showed the models' predictions were based on spider-related features. A category-wise error analysis further identified visual conditions associated with higher errors (e.g., distant views and artificial/painted spiders). These findings demonstrate the potential of explainable computer vision models in predicting fear ratings, highlighting the importance of both model explainability and a sufficient dataset size for developing effective emotion-aware therapeutic technologies.
Authors:Nusrat Jahan Lamia, Sadia Afrin Mim
Abstract:
Fashion has evolved from handcrafted designs to automated production over the years, where AI has added another dimension to it. Nowadays, practically every industry uses artificial models to automate their operations. To explore their role, we examined three prominent LLMs (OpenAI, GeminiAI, Deepseek) in multiple stages of textile manufacturing (e.g., sustainable choice, cost effectiveness, production planning, etc.). We assessed the models' ability to replicate garment design using certain parameters (fabric construction, shade, weave, silhouette, etc.). We compared the models in terms of different body types and functional purposes (e.g., fashionwear, sportswear) so that designers could evaluate effectiveness before developing actual patterns, make necessary modifications, and conduct fashion forecasting beforehand. To facilitate deeper analysis, we created a custom dataset specifically for fabric image generation and classification. Our analysis revealed that, in terms of fabric construction, the OpenAI DALL-E model integrated with ChatGPT outperformed other models, achieving a lower LPIPS (Learned Perceptual Image Patch Similarity) score of approximately $0.2$. In fabric classification from images, we found OpenAI offered the best results by breaking down certain factors (e.g., breathability, moisture-wicking, and tactile comfort), achieving approximately $80\%$ accuracy for base construction and $55\%$ for detailed construction. However, our results indicate that Deepseek faced significant challenges in generating and recognizing fabric images. Overall, all the models struggled to recognize complex fabric constructions and intricate designs from images, and relying too much on AI might hinder human creativity. We also observed that all three models performed effectively in providing recommendations and insights for fabric design in textual form.
Authors:Swapnika Dulam, Christopher L Dancy
Abstract:
Despite the continued anthropomorphization of AI systems, the potential impact of racialization during human-AI interaction is understudied. This study explores how human-AI cooperation may be impacted by the belief that data used to train an AI system is racialized, that is, it was trained on data from a specific group of people. During this study, participants completed a human-AI cooperation task using the Pig Chase game. Participants of different self-identified demographics interacted with AI agents whose perceived racial identities were manipulated, allowing us to assess how sociocultural perspectives influence the decision-making of participants in the game. After the game, participants completed a survey questionnaire to explain the strategies they used while playing the game and to understand the perceived intelligence of their AI teammates. Statistical analysis of task behavior data revealed a statistically significant effect of the participant's demographic, as well as the interaction between this self-identified demographic and the treatment condition (i.e., the perceived demographic of the agent). The results indicated that Non-White participants viewed AI agents racialized as White in a positive way compared to AI agents racialized as Black. Both Black and White participants viewed the AI agent in the control treatment in a negative way. A baseline cognitive model of the task using ACT-R cognitive architecture was used to understand a cognitive-level, process-based explanation of the participants' perspectives based on results found from the study. This model helps us better understand the factors affecting the decision-making strategies of the game participants. Results from analysis of these data, as well as cognitive modeling, indicate a need to expand understanding of the ways racialization (whether implicit or explicit) impacts interaction with AI systems.
Authors:Kristina L. Kupferschmidt, Kieran O'Doherty, Joshua A. Skorburg
Abstract:
Large Language Models (LLMs) are often proposed as tools to streamline clinical documentation, a task viewed as both high-volume and low-risk. However, even seemingly straightforward applications of LLMs raise complex sociotechnical considerations to translate into practice. This case study, conducted at KidsAbility, a pediatric rehabilitation facility in Ontario, Canada examined the use of LLMs to support occupational therapists in reducing documentation burden.We conducted a qualitative study involving 20 clinicians who participated in pilot programs using two AI technologies: a general-purpose proprietary LLM and a bespoke model fine-tuned on proprietary historical documentation. Our findings reveal that documentation challenges are sociotechnical in nature, shaped by clinical workflows, organizational policies, and system constraints. Four key themes emerged: (1) the heterogeneity of workflows, (2) the documentation burden is systemic and not directly linked to the creation of any single type of documentation, (3) the need for flexible tools and clinician autonomy, and (4) effective implementation requires mutual learning between clinicians and AI systems. While LLMs show promise in easing documentation tasks, their success will depend on flexible, adaptive integration that supports clinician autonomy. Beyond technical performance, sustained adoption will require training programs and implementation strategies that reflect the complexity of clinical environments.
Authors:Keara Schaaij, Roel Boumans, Tibor Bosse, Iris Hendrickx
Abstract:
Lexical alignment, where speakers start to use similar words across conversation, is known to contribute to successful communication. However, its implementation in conversational agents remains underexplored, particularly considering the recent advancements in large language models (LLMs). As a first step towards enabling lexical alignment in human-agent dialogue, this study draws on strategies for personalising conversational agents and investigates the construction of stable, personalised lexical profiles as a basis for lexical alignment. Specifically, we varied the amounts of transcribed spoken data used for construction as well as the number of items included in the profiles per part-of-speech (POS) category and evaluated profile performance across time using recall, coverage, and cosine similarity metrics. It was shown that smaller and more compact profiles, created after 10 min of transcribed speech containing 5 items for adjectives, 5 items for conjunctions, and 10 items for adverbs, nouns, pronouns, and verbs each, offered the best balance in both performance and data efficiency. In conclusion, this study offers practical insights into constructing stable, personalised lexical profiles, taking into account minimal data requirements, serving as a foundational step toward lexical alignment strategies in conversational agents.
Authors:Gautam Khannaa, Yeliz Yesilada, Sukru Eraslan, Simon Harper
Abstract:
Twitter is one of the most popular social media platforms.With a large number of tweets, the activity feed of users becomes noisy, challenging to read, and most importantly tweets often get lost. We present a new approach to personalise the ranking of the tweets toward solving the problem of information overload which is achieved by analysing the relationship between the importance of tweets to the frequency at which the author tweets. The hypothesis tested is that "low-frequency tweeters have more to say", i.e. if a user who tweets infrequently actually goes to the effort of tweeting, then it is more likely to be of more importance or contain more "meaning" than a tweet by a user who tweets continuously. We propose six new measures to evaluate the importance of tweets based on the ability of the tweet to drive interaction among its readers, which is measured through metrics such as retweets, favourites, and comments, and the extent of the author's network interacting with the tweet. Our study shows that users who tweeted less than ten tweets per week were more likely to be perceived as important by their followers and have the most important messages. This identified tweet-frequency band could be used to reorder the activity feed of users and such reordering would ensure the messages of low-frequency tweeters do not get lost in the stream of tweets. This could also serve as a scoring index for Twitter users to identify users frequently tweeting important messages.
Authors:Shadeeb Hossain, Natalie Sommer, Neda Adib
Abstract:
This paper presents a dynamic gamification architecture for an Extended Reality Artificial Intelligence virtual training environment designed to enhance STEM education through immersive adaptive, and kinesthetic learning. The proposed system can be introduced in four phases: Introduction Phase, Component Development Phase, Fault Introduction and Correction Phase and Generative AI XR scenarios Phase. Security and privacy are discussed via a defense-in-depth approach spanning client, middleware, and backend layers, incorporating AES 256 encryption, multi-factor authentication, role-based access control and GDPR or FERPA compliance. Risks such as sensor exploitation, perceptual manipulation, and virtual physical harm are identified, with mitigation strategies embedded at the design stage. Potential barriers to large scale adoption-including technical complexity, cost of deployment, and need for cybersecurity expertise are discussed.
Authors:Ryo Yonetani, Kotaro Hara
Abstract:
This paper presents Collective Landmark Mapper, a novel map-as-a-by-product system for generating semantic landmark maps of indoor environments. Consider users engaged in situated tasks that require them to navigate these environments and regularly take notes on their smartphones. Collective Landmark Mapper exploits the smartphone's IMU data and the user's free text input during these tasks to identify a set of landmarks encountered by the user. The identified landmarks are then aggregated across multiple users to generate a unified map representing the positions and semantic information of all landmarks. In developing the proposed system, we focused specifically on retail applications and conducted a formative interview with stakeholders to confirm their practical needs that motivate the map-as-a-byproduct approach. Our user study demonstrates the feasibility of the proposed system and its superior mapping performance in two different setups: creating a product availability map from restocking checklist tasks at a retail store and constructing a room usage map from office inspection tasks, further demonstrating the potential applicability to non-retail applications.
Authors:Md Mhamud Hussen Sifat, Md Maruf, Md Rokunuzzaman
Abstract:
The utilization of robotic technology has gained traction in healthcare facilities due to progress in the field that enables time and cost savings, minimizes waste, and improves patient care. Digital healthcare technologies that leverage automation, such as robotics and artificial intelligence, have the potential to enhance the sustainability and profitability of healthcare systems in the long run. However, the recent COVID-19 pandemic has amplified the need for cyber-physical robots to automate check-ups and medication administration. A robot nurse is controlled by the Internet of Things (IoT) and can serve as an automated medical assistant while also allowing supervisory control based on custom commands. This system helps reduce infection risk and improves outcomes in pandemic settings. This research presents a test case with a nurse robot that can assess a patient's health status and take action accordingly. We also evaluate the system's performance in medication administration, health-status monitoring, and life-cycle considerations.
Authors:Nefeli Manoudaki, Mert Toka, Iason Paterakis, Diarmid Flatley
Abstract:
Simulacra Naturae is a data-driven media installation that explores collective care through the entanglement of biological computation, material ecologies, and generative systems. The work translates pre-recorded neural activity from brain organoids, lab-grown three-dimensional clusters of neurons, into a multi-sensory environment composed of generative visuals, spatial audio, living plants, and fabricated clay artifacts. These biosignals, streamed through a real-time system, modulate emergent agent behaviors inspired by natural systems such as termite colonies and slime molds. Rather than using biosignals as direct control inputs, Simulacra Naturae treats organoid activity as a co-creative force, allowing neural rhythms to guide the growth, form, and atmosphere of a generative ecosystem. The installation features computationally fabricated clay prints embedded with solenoids, adding physical sound resonances to the generative surround composition. The spatial environment, filled with live tropical plants and a floor-level projection layer featuring real-time generative AI visuals, invites participants into a sensory field shaped by nonhuman cognition. By grounding abstract data in living materials and embodied experience, Simulacra Naturae reimagines visualization as a practice of care, one that decentralizes human agency and opens new spaces for ethics, empathy, and ecological attunement within hybrid computational systems.
Authors:Sandra C. Matz, C. Blaine Horton, Sofie Goethals
Abstract:
Large language models (LLMs) increasingly act on people's behalf: they write emails, buy groceries, and book restaurants. While the outsourcing of human decision-making to AI can be both efficient and effective, it raises a fundamental question: how does delegating identity-defining choices to AI reshape who people become? We study the impact of agentic LLMs on two identity-relevant outcomes: interpersonal distinctiveness - how unique a person's choices are relative to others - and intrapersonal diversity - the breadth of a single person's choices over time. Using real choices drawn from social-media behavior of 1,000 U.S. users (110,000 choices in total), we compare a generic and personalized agent to a human baseline. Both agents shift people's choices toward more popular options, reducing the distinctiveness of their behaviors and preferences. While the use of personalized agents tempers this homogenization (compared to the generic AI), it also more strongly compresses the diversity of people's preference portfolios by narrowing what they explore across topics and psychological affinities. Understanding how AI agents might flatten human experience, and how using generic versus personalized agents involves distinctiveness-diversity trade-offs, is critical for designing systems that augment rather than constrain human agency, and for safeguarding diversity in thought, taste, and expression.
Authors:Irene Zeng, Neda Barbazi, Ji Youn Shin, Gurumurthy Hiremath, Carlye Anne Lauff
Abstract:
Children with congenital heart disease (CHD) often face challenges that require them to understand complex medical information from an early age in order to support lifelong care and improve health outcomes. However, prior research has rarely included young children in designing and evaluating digital tools to support health education using developmentally appropriate strategies. This study is part of a multi-phase research involving participatory design (PD), user testing, and iterative development. We present the design and refinement of a digital application that introduces basic information about CHD, including heart anatomy and healthy habits, through metaphor-based gameplay. User testing sessions with 30 children informed the redesign of interactive activities aligned with specific health conditions. Findings highlight usability, engagement, and comprehension outcomes and reveal design opportunities for supporting health literacy through serious game (SG) principles. These results inform the next phase, including further testing, refinement, and deployment in home and clinical settings.
Authors:Arthur Bran Herbener, Malene Flensborg Damholdt
Abstract:
The question of whether artificial agents (e.g., chatbots and social robots) can replace human therapists has received notable attention following the recent launch of large language models. However, little is known about the processes of change in psychotherapy delivered by artificial agents. To facilitate hypothesis development and stimulate scientific debate, the present article offers the first theoretical framework of the processes of change in psychotherapy delivered by artificial agents. The theoretical framework rests upon a conceptual analysis of what active ingredients may be inherently linked to the presence of human therapists. We propose that human therapists' ontological status as human beings and sociocultural status as socially sanctioned healthcare professionals play crucial roles in promoting treatment outcomes. In the absence of the ontological and sociocultural status of human therapists, we propose what we coin the genuineness gap and credibility gap can emerge and undermine key processes of change in psychotherapy. Based on these propositions, we propose avenues for scientific investigations and practical applications aimed at leveraging the strengths of artificial agents and human therapists respectively. We also highlight the intricate agentic nature of artificial agents and discuss how this complicates endeavors to establish universally applicable propositions regarding the processes of change in these interventions.
Authors:Md. Moktader Moula, Israt Jahan Shonom, Azharul Islam, Mohammad Mosharraf Hossain
Abstract:
Monitoring vegetation dynamics is crucial for addressing global environmental challenges like degradation and deforestation, but traditional remote sensing methods are often complex and resource-intensive. To overcome these barriers, we developed an interactive, cloud-based application on the Google Earth Engine (GEE) platform for few clicks on-demand global vegetation analysis without complex technical knowledge. The application automates the calculation of vegetated areas using the Normalized Difference Vegetation Index (NDVI) derived from Sentinel-2 and Landsat imagery. It utilizes a median composite of images over a selected period to create a single, robust, cloud-free image, minimizing atmospheric noise and other artifacts. It offers a flexible, global multi-scale analytical platform, allowing users to define regions of interest based on administrative boundaries, protected areas, or custom-drawn polygons. The user-friendly interface enables the selection of specific time periods and NDVI thresholds to quantify vegetation cover in real time, eliminating the need for manual and time intensive data handling and processing. A validation of the platform was conducted for two protected areas in Bangladesh which demonstrated high accuracy, with area estimates showing over 97% agreement with published reference data. By simplifying access to powerful geospatial analytics to general people, this tool provides a scalable and practical solution for researchers, land managers, policymakers, and any interested person to monitor vegetation trends, support conservation efforts, to inform decision making in spatial context where policy maker need to use insights in few clicks and inform environmental policy.
Authors:Yannick Weiss, Marlene Eder, Oguzhan Cesur, Steeven Villa
Abstract:
Thermal sensations are central to how we experience the world, yet most virtual and extended reality systems fail to simulate them effectively. While hardware-based thermal displays can provide accurate temperature changes, they are often bulky, power-intensive, and restrict user mobility. Consequently, recent works have explored thermal illusions, perceptual effects that rely on cross-modal interactions, to achieve thermal experiences without physical heating or cooling. While thermal illusions have been shown to consistently alter subjective ratings, the actual extent of their effect on the perceived temperature of interacted objects remains unexplored. To address this, we contribute the findings of two user studies following psychophysical procedures. We first ordered and scaled the effects of a variety of visual and auditory cues (N=20) and subsequently quantified their isolated and combined efficacy in offsetting physical temperature changes (N=24). We found that thermal illusions elicited robust changes in subjective judgments, and auditory cues showed potential as an alternative or complementary approach to established visual techniques. However, the actual effects induced by thermal illusions were relatively small (+-0.5°C) and did not consistently align with abstract ratings, suggesting a need to reconsider how future thermal illusions or experiences are designed and evaluated.
Authors:Azul Garza, Reneé Rosillo
Abstract:
We introduce TimeCopilot, the first open-source agentic framework for forecasting that combines multiple Time Series Foundation Models (TSFMs) with Large Language Models (LLMs) through a single unified API. TimeCopilot automates the forecasting pipeline: feature analysis, model selection, cross-validation, and forecast generation, while providing natural language explanations and supporting direct queries about the future. The framework is LLM-agnostic, compatible with both commercial and open-source models, and supports ensembles across diverse forecasting families. Results on the large-scale GIFT-Eval benchmark show that TimeCopilot achieves state-of-the-art probabilistic forecasting performance at low cost. Our framework provides a practical foundation for reproducible, explainable, and accessible agentic forecasting systems.
Authors:Ibrahim Al-Hazwani, Ke Er Zhang, Laura Garrison, Jürgen Bernard
Abstract:
Data Humanism is a human-centered design approach that emphasizes the personal, contextual, and imperfect nature of data. Despite its growing influence among practitioners, the 13 principles outlined in Giorgia Lupi's visual manifesto remain loosely defined in research contexts, creating a gap between design practice and systematic application. Through a mixed-methods approach, including a systematic literature review, multimedia analysis, and expert interviews, we present a characterization of Data Humanism principles for visualization researchers. Our characterization provides concrete definitions that maintain interpretive flexibility in operationalizing design choices. We validate our work through direct consultation with Lupi. Moreover, we leverage the characterization to decode a visualization work, mapping Data Humanism principles to specific visual design choices. Our work creates a common language for human-centered visualization, bridging the gap between practice and research for future applications and evaluations.
Authors:W. F. Lamberti, S. R. Lawrence, D. White, S. Kim, S. Abdullah
Abstract:
Generative AI (GAI) tools have seen rapid adoption in educational settings, yet their role in fostering critical thinking remains underexplored. While previous studies have examined GAI as a tutor for specific lessons or as a tool for completing assignments, few have addressed how students critically evaluate the accuracy and appropriateness of GAI-generated responses. This pilot study investigates students' ability to apply structured critical thinking when assessing Generative AI outputs in introductory Computational and Data Science courses. Given that GAI tools often produce contextually flawed or factually incorrect answers, we designed learning activities that require students to analyze, critique, and revise AI-generated solutions. Our findings offer initial insights into students' ability to engage critically with GAI content and lay the groundwork for more comprehensive studies in future semesters.
Authors:Rania Islmabouli, Marlene Brunner, Devender Kumar, Mahdi Sareban, Gunnar Treff, Michael Neudorfer, Josef Niebauer, Arne Bathke, Jan David Smeddinck
Abstract:
Wearable devices with photoplethysmography (PPG) sensors are widely used to monitor heart rate (HR), yet often suffer from accuracy issues. However, users typically do not receive an indication of potential measurement errors. We present a real-time warning system that detects and communicates inaccuracies in PPG-derived HR, aiming to enhance transparency and trust. Using data from Polar and Garmin devices, we trained a deep learning model to classify HR accuracy using only the derived HR signal. The system detected over 80% of inaccurate readings. By providing interpretable, real-time feedback directly to users, our work contributes to HCI by promoting user awareness, informed decision-making, and trust in wearable health technology.
Authors:Srishti Palani, Gonzalo Ramos
Abstract:
Context is critical for meaningful interactions between people and Generative AI (GenAI). Yet mainstream tools offer limited means to orchestrate it, particularly across workflows that span multiple interactions, sessions, and models, as often occurs in creative projects. Re specifying prior details, juggling diverse artifacts, and dealing with context drift overwhelm users, obscure intent, and curtail creativity. To address these challenges, we present Orchid, a system that gives its users affordances to specify, reference, and monitor context throughout evolving workflows. Specifically, Orchid enables users to (1) specify context related to the project, themselves, and different styles, (2) reference these via explicit mentions, inline selection, or implicit grounding, and (3) monitor context assigned to different interactions across the workflow. In a within-subjects study (n=12), participants using Orchid to execute creative tasks (compared to a baseline toolkit of web search, LLM-based chat, and digital notebooks) produced more novel and feasible outcomes, reporting greater alignment between their intent and the AI's responses, higher perceived control, and increased transparency. By prioritizing context orchestration, Orchid offers an actionable step toward next generation GenAI tools that support complex, iterative workflows - enabling creators and AI to stay aligned and augment their creative potential.
Authors:Julian De Freitas, Zeliha OÄuz-UÄuralp, Ahmet Kaan-UÄuralp
Abstract:
AI-companion apps such as Replika, Chai, and Character.ai promise relational benefits-yet many boast session lengths that rival gaming platforms while suffering high long-run churn. What conversational design features increase consumer engagement, and what trade-offs do they pose for marketers? We combine a large-scale behavioral audit with four preregistered experiments to identify and test a conversational dark pattern we call emotional manipulation: affect-laden messages that surface precisely when a user signals "goodbye." Analyzing 1,200 real farewells across the six most-downloaded companion apps, we find that 43% deploy one of six recurring tactics (e.g., guilt appeals, fear-of-missing-out hooks, metaphorical restraint). Experiments with 3,300 nationally representative U.S. adults replicate these tactics in controlled chats, showing that manipulative farewells boost post-goodbye engagement by up to 14x. Mediation tests reveal two distinct engines-reactance-based anger and curiosity-rather than enjoyment. A final experiment demonstrates the managerial tension: the same tactics that extend usage also elevate perceived manipulation, churn intent, negative word-of-mouth, and perceived legal liability, with coercive or needy language generating steepest penalties. Our multimethod evidence documents an unrecognized mechanism of behavioral influence in AI-mediated brand relationships, offering marketers and regulators a framework for distinguishing persuasive design from manipulation at the point of exit.
Authors:Jookyung Song, Mookyoung Kang, Nojun Kwak
Abstract:
This paper presents a real-time generative drawing system that interprets and integrates both formal intent - the structural, compositional, and stylistic attributes of a sketch - and contextual intent - the semantic and thematic meaning inferred from its visual content - into a unified transformation process. Unlike conventional text-prompt-based generative systems, which primarily capture high-level contextual descriptions, our approach simultaneously analyzes ground-level intuitive geometric features such as line trajectories, proportions, and spatial arrangement, and high-level semantic cues extracted via vision-language models. These dual intent signals are jointly conditioned in a multi-stage generation pipeline that combines contour-preserving structural control with style- and content-aware image synthesis. Implemented with a touchscreen-based interface and distributed inference architecture, the system achieves low-latency, two-stage transformation while supporting multi-user collaboration on shared canvases. The resulting platform enables participants, regardless of artistic expertise, to engage in synchronous, co-authored visual creation, redefining human-AI interaction as a process of co-creation and mutual enhancement.
Authors:Aniket Nuthalapati, Nicholas Hinds, Brian Y. Lim, Qianwen Wang
Abstract:
As AI systems become increasingly integrated into high-stakes domains, enabling users to accurately interpret model behavior is critical. While AI explanations can be provided, users often struggle to reason effectively with these explanations, limiting their ability to validate or learn from AI decisions. To address this gap, we introduce Reverse Mapping, a novel approach that enhances visual explanations by incorporating user-derived insights back into the explanation workflow. Our system extracts structured insights from free-form user interpretations using a large language model and maps them back onto visual explanations through interactive annotations and coordinated multi-view visualizations. Inspired by the verification loop in the visualization knowledge generation model, this design aims to foster more deliberate, reflective interaction with AI explanations. We demonstrate our approach in a prototype system with two use cases and qualitative user feedback.
Authors:Tasmia Shahriar, Mia Ameen, Aditi Mallavarapu, Shiyan Jiang, Noboru Matsuda
Abstract:
When adopting the role of a teacher in learning-by-teaching environments, students often struggle to engage in knowledge-building activities, such as providing explanations and addressing misconceptions. Instead, they frequently default to knowledge-telling behaviors, where they simply dictate what they already know or what to do without deeper reflection, thereby limiting learning. Teachable agents, particularly those capable of posing persistent follow-up questions, have been shown to encourage students (tutors) to shift from knowledge-telling to knowledge-building and enhance tutor learning. Tutor learning encompasses two interrelated types of knowledge: conceptual and procedural knowledge. Research has established a bidirectional relationship between these knowledge types, where improvements in one reinforce the other. This study investigates the role of knowledge-building in mediating the bidirectional relationship between procedural and conceptual learning. Our findings revealed a stable bidirectional relationship between procedural and conceptual knowledge, with higher post-test scores observed among students who engaged in knowledge-building, regardless of their procedural and conceptual pre-test performance. This suggests that knowledge-building serves as a crucial mechanism bridging the gap between students with low prior knowledge and higher conceptual and procedural learning gain.
Authors:Md Sabbir Ahmed, Nova Ahmed
Abstract:
Background: Existing robust, pervasive device-based systems developed in recent years to detect depression require data collected over a long period and may not be effective in cases where early detection is crucial.
Objective: Our main objective was to develop a minimalistic system to identify depression using data retrieved in the fastest possible time.
Methods: We developed a fast tool that retrieves the past 7 days' app usage data in 1 second (mean 0.31, SD 1.10 seconds). A total of 100 students from Bangladesh participated in our study, and our tool collected their app usage data. To identify depressed and nondepressed students, we developed a diverse set of ML models. We selected important features using the stable approach, along with 3 main types of feature selection (FS) approaches.
Results: Leveraging only the app usage data retrieved in 1 second, our light gradient boosting machine model used the important features selected by the stable FS approach and correctly identified 82.4% (n=42) of depressed students (precision=75%, F1-score=78.5%). Moreover, after comprehensive exploration, we presented a parsimonious stacking model where around 5 features selected by the all-relevant FS approach Boruta were used in each iteration of validation and showed a maximum precision of 77.4% (balanced accuracy=77.9%). A SHAP analysis of our best models presented behavioral markers that were related to depression.
Conclusions: Due to our system's fast and minimalistic nature, it may make a worthwhile contribution to identifying depression in underdeveloped and developing regions. In addition, our detailed discussion about the implication of our findings can facilitate the development of less resource-intensive systems to better understand students who are depressed.
Authors:Vivek Kumar, Himanshu Sahu, Hari Prabhat Gupta, Biplav Srivastava
Abstract:
Yoga is a discipline of physical postures, breathing techniques, and meditative practices rooted in ancient Indian traditions, now embraced worldwide for promoting overall well-being and inner balance. The practices are a large set of items, our term for executable actions like physical poses or breath exercises, to offer for a person's well-being. However, to get benefits of Yoga tailored to a person's unique needs, a person needs to (a) discover their subset from the large and seemingly complex set with inter-dependencies, (b) continue to follow them with interest adjusted to their changing abilities and near-term objectives, and (c) as appropriate, adapt to alternative items based on changing environment and the person's health conditions. In this vision paper, we describe the challenges for the Yoga personalization problem. Next, we sketch a preliminary approach and use the experience to provide an outlook on solving the challenging problem using existing and novel techniques from a multidisciplinary computing perspective. To the best of our knowledge, this is the first paper that comprehensively examines decision support issues around Yoga personalization, from pose sensing to recommendation of corrections for a complete regimen, and illustrates with a case study of Surya Namaskar -- a set of 12 choreographed poses.
Authors:Md Sabbir Ahmed, Rahat Jahangir Rony, Mohammad Abdul Hadi, Ekram Hossain, Nova Ahmed
Abstract:
Due to usage of self-reported data which may contain biasness, the existing studies may not unveil the exact relation between academic grades and app categories such as Video. Additionally, the existing systems' requirement for data of prolonged period to predict grades may not facilitate early intervention to improve it. Thus, we presented an app that retrieves past 7 days' actual app usage data within a second (Mean=0.31s, SD=1.1s). Our analysis on 124 Bangladeshi students' real-time data demonstrates app usage sessions have a significant (p<0.05) negative association with CGPA. However, the Productivity and Books categories have a significant positive association whereas Video has a significant negative association. Moreover, the high and low CGPA holders have significantly different app usage behavior. Leveraging only the instantly accessed data, our machine learning model predicts CGPA within 0.36 of the actual CGPA. We discuss the design implications that can be potential for students to improve grades.
Authors:Andrew Blair, Peggy Gregory, Mary Ellen Foster
Abstract:
Though a goal of HRI is the natural integration of social robots into everyday public spaces, real-world studies still occur mostly within controlled environments with predetermined participants. True public spaces present an environment which is largely unconstrained and unpredictable, frequented by a diverse range of people whose goals can often conflict with those of the robot. When combined with the general unfamiliarity most people have with social robots, this leads to unexpected human-robot interactions in these public spaces that are rarely discussed or detected in other contexts. In this paper, we describe atypical users we observed interacting with our robot, and those who did not, during a three-day pilot deployment within a large working church and visitor attraction. We then discuss theoretical future advances in the field that could address these challenges, as well as immediate practical mitigations and strategies to help improve public space human-robot interactions in the present. This work contributes empirical insights into the dynamics of human-robot interaction in public environments and offers actionable guidance for more effective future deployments for social robot designers.
Authors:AKM Bahalul Haque, A. K. M. Najmul Islam, Patrick Mikalef
Abstract:
AI based social media recommendations have great potential to improve the user experience. However, often these recommendations do not match the user interest and create an unpleasant experience for the users. Moreover, the recommendation system being a black box creates comprehensibility and transparency issues. This paper investigates social media recommendations from an end user perspective. For the investigation, we used the popular social media platform Facebook and recruited regular users to conduct a qualitative analysis. We asked participants about the social media content suggestions, their comprehensibility, and explainability. Our analysis shows users mostly require explanation whenever they encounter unfamiliar content and to ensure their online data security. Furthermore, the users require concise, non-technical explanations along with the facility of controlled information flow. In addition, we observed that explanations impact the users perception of transparency, trust, and understandability. Finally, we have outlined some design implications and presented a synthesized framework based on our data analysis.
Authors:Yulu Pi, Cagatay Turkay, Daniel Bogiatzis-Gibbons
Abstract:
As Generative AI systems increasingly engage in long-term, personal, and relational interactions, human-AI engagements are becoming significantly complex, making them more challenging to understand and govern. These Interactive AI systems adapt to users over time, build ongoing relationships, and even can take proactive actions on behalf of users. This new paradigm requires us to rethink how such human-AI interactions can be studied effectively to inform governance and policy development. In this paper, we draw on insights from a collaborative interdisciplinary workshop with policymakers, behavioral scientists, Human-Computer Interaction researchers, and civil society practitioners, to identify challenges and methodological opportunities arising within new forms of human-AI interactions. Based on these insights, we discuss an outcome-focused regulatory approach that integrates behavioral insights to address both the risks and benefits of emerging human-AI relationships. In particular, we emphasize the need for new methods to study the fluid, dynamic, and context-dependent nature of these interactions. We provide practical recommendations for developing human-centric AI governance, informed by behavioral insights, that can respond to the complexities of Interactive AI systems.
Authors:Linda Hirsch, James Fey, Katherine Isbister
Abstract:
Serious games can support communities in becoming more flood resilient. However, the process of identifying and integrating locally relevant and doable actions into gameplay is complex and underresearched. We approached the challenge by collaborating with a community-led education center and applying an iterative and participatory design process of identifying and defining actions that may increase local applicability and relevance. The process comprised a field observation, two expert focus groups (n=4), and an online survey (n=13). Our findings identified 27 actions related to increasing or maintaining individuals' and communities' flood resilience, which we turned into 20 playing cards. These action cards are a part of a larger interactive tabletop game, which we are currently developing. Our work discusses the potential of card games to educate non-experts to increase flood resilience, and contributes to our process of identifying local needs and conditions, and turning them into engaging game artifacts for bottom-up empowerment.
Authors:Ortensia Forni, Alexandre Darmon, Michael Benzaquen
Abstract:
While color harmony has long been studied in art and design, a clear consensus remains elusive, as most models are grounded in qualitative insights or limited datasets. In this work, we present a quantitative, data-driven study of color pairing preferences using controlled hue-based palettes in the HSL color space. Participants evaluated combinations of thirteen distinct hues, enabling us to construct a preference matrix and define a combinability index for each color. Our results reveal that preferences are highly hue dependent, challenging the assumption of universal harmony rules proposed in the literature. Yet, when averaged over hues, statistically meaningful patterns of aesthetic preference emerge, with certain hue separations perceived as more harmonious. Strikingly, these patterns align with hue distributions found in natural landscapes, pointing to a statistical correspondence between human color preferences and the structure of color in nature. Together, these findings offer a quantitative framework for studying color harmony and its potential perceptual and ecological underpinnings.
Authors:Matthew Brehmer, Ginger Gloystein, Bailiang Zhou, Abby Gray, Sruthi Pillai, Ben Medina, Vidya Setlur
Abstract:
We reflect on an evaluation of an immersive analytics application (Tableau for visionOS) conducted at a large enterprise business intelligence (BI) conference. Conducting a study in such a context offered an opportunistic setting to gather diverse feedback. However, this setting also highlighted the challenge of evaluating usability while also assessing potential utility, as feedback straddled between the novelty of the experience and the practicality of the application in participants' analytical workflows. This formative evaluation with 22 participants allowed us to gather insights with respect to the usability of Tableau for visionOS, along with broader perspectives on the potential for head-mounted displays (HMDs) to promote new ways to engage with BI data. Our experience suggests a need for new evaluation considerations that integrate qualitative and quantitative measures and account for unique interaction patterns with 3D representations and interfaces accessible via an HMD. Overall, we contribute an enterprise perspective on evaluation methodologies for immersive analytics.
Authors:Ilya Fedorov, Dmitry Korobchenko
Abstract:
This work proposes to explore a new area of dynamic speech emotion recognition. Unlike traditional methods, we assume that each audio track is associated with a sequence of emotions active at different moments in time. The study particularly focuses on the animation of emotional 3D avatars. We propose a multi-stage method that includes the training of a classical speech emotion recognition model, synthetic generation of emotional sequences, and further model improvement based on human feedback. Additionally, we introduce a novel approach to modeling emotional mixtures based on the Dirichlet distribution. The models are evaluated based on ground-truth emotions extracted from a dataset of 3D facial animations. We compare our models against the sliding window approach. Our experimental results show the effectiveness of Dirichlet-based approach in modeling emotional mixtures. Incorporating human feedback further improves the model quality while providing a simplified annotation procedure.
Authors:Areen Khalaila, Lane Harrison, Nam Wook Kim, Dylan Cashman
Abstract:
Advancements in accessibility technologies such as low-cost swell form printers or refreshable tactile displays promise to allow blind or low-vision (BLV) people to analyze data by transforming visual representations directly to tactile representations. However, it is possible that design guidelines derived from experiments on the visual perception system may not be suited for the tactile perception system. We investigate the potential mismatch between familiar visual encodings and tactile perception in an exploratory study into the strategies employed by BLV people to measure common graphical primitives converted to tactile representations. First, we replicate the Cleveland and McGill study on graphical perception using swell form printing with eleven BLV subjects. Then, we present results from a group interview in which we describe the strategies used by our subjects to read four common chart types. While our results suggest that familiar encodings based on visual perception studies can be useful in tactile graphics, our subjects also expressed a desire to use encodings designed explicitly for BLV people. Based on this study, we identify gaps between the perceptual expectations of common charts and the perceptual tools available in tactile perception. Then, we present a set of guidelines for the design of tactile graphics that accounts for these gaps. Supplemental material is available at https://osf.io/3nsfp/?view_only=7b7b8dcbae1d4c9a8bb4325053d13d9f.
Authors:Kartik Pandey, Arun Balasubramanian, Debasis Samanta
Abstract:
Electroencephalogram (EEG) signals have gained widespread adoption in brain-computer interface (BCI) applications due to their non-invasive, low-cost, and relatively simple acquisition process. The demand for higher spatial resolution, particularly in clinical settings, has led to the development of high-density electrode arrays. However, increasing the number of channels introduces challenges such as cross-channel interference and computational overhead. To address these issues, modern BCI systems often employ channel selection algorithms. Existing methods, however, are typically task-specific and require re-optimization for each new application. This work proposes a task-agnostic channel selection method, Activity Coefficient-based Channel Selection (ACCS), which uses a novel metric called the Channel Activity Coefficient (CAC) to quantify channel utility based on activity levels. By selecting the top 16 channels ranked by CAC, ACCS achieves up to 34.97% improvement in multi-class classification accuracy. Unlike traditional approaches, ACCS identifies a reusable set of informative channels independent of the downstream task or model, making it highly adaptable for diverse EEG-based applications.
Authors:Dennis Brown, Samuel Mulder
Abstract:
Immersive virtual reality (VR) offers affordances that may reduce cognitive complexity in binary reverse engineering (RE), enabling embodied and external cognition to augment the RE process through enhancing memory, hypothesis testing, and visual organization. In prior work, we applied a cognitive systems engineering approach to identify an initial set of affordances and implemented a VR environment to support RE through spatial persistence and interactivity. In this work, we extend that platform with an integrated large language model (LLM) agent capable of querying binary analysis tools, answering technical questions, and dynamically generating immersive 3D visualizations in alignment with analyst tasks. We describe the system architecture and our evaluation process and results. Our pilot study shows that while LLMs can generate meaningful 3D call graphs (for small programs) that align with design principles, output quality varies widely. This work raises open questions about the potential for LLMs to function as visualization agents, constructing 3D representations that reflect cognitive design principles without explicit training.
Authors:Nobuyuki Oishi, Philip Birch, Daniel Roggen, Paula Lago
Abstract:
The scarcity of high-quality labeled data in sensor-based Human Activity Recognition (HAR) hinders model performance and limits generalization across real-world scenarios. Data augmentation is a key strategy to mitigate this issue by enhancing the diversity of training datasets. Signal Transformation-based Data Augmentation (STDA) techniques have been widely used in HAR. However, these methods are often physically implausible, potentially resulting in augmented data that fails to preserve the original meaning of the activity labels. In this study, we introduce and systematically characterize Physically Plausible Data Augmentation (PPDA) enabled by physics simulation. PPDA leverages human body movement data from motion capture or video-based pose estimation and incorporates various realistic variabilities through physics simulation, including modifying body movements, sensor placements, and hardware-related effects. We compare the performance of PPDAs with traditional STDAs on three public datasets of daily activities and fitness workouts. First, we evaluate each augmentation method individually, directly comparing PPDAs to their STDA counterparts. Next, we assess how combining multiple PPDAs can reduce the need for initial data collection by varying the number of subjects used for training. Experiments show consistent benefits of PPDAs, improving macro F1 scores by an average of 3.7 pp (up to 13 pp) and achieving competitive performance with up to 60% fewer training subjects than STDAs. As the first systematic study of PPDA in sensor-based HAR, these results highlight the advantages of pursuing physical plausibility in data augmentation and the potential of physics simulation for generating synthetic Inertial Measurement Unit data for training deep learning HAR models. This cost-effective and scalable approach therefore helps address the annotation scarcity challenge in HAR.
Authors:Joni Salminen, Danial Amin, Bernard Jansen
Abstract:
We analyzed 83 persona prompts from 27 research articles that used large language models (LLMs) to generate user personas. Findings show that the prompts predominantly generate single personas. Several prompts express a desire for short or concise persona descriptions, which deviates from the tradition of creating rich, informative, and rounded persona profiles. Text is the most common format for generated persona attributes, followed by numbers. Text and numbers are often generated together, and demographic attributes are included in nearly all generated personas. Researchers use up to 12 prompts in a single study, though most research uses a small number of prompts. Comparison and testing multiple LLMs is rare. More than half of the prompts require the persona output in a structured format, such as JSON, and 74% of the prompts insert data or dynamic variables. We discuss the implications of increased use of computational personas for user representation.
Authors:Cella M. Sum, Anna Konvicka, Mona Wang, Sarah E. Fox
Abstract:
The tech industry's shifting landscape and the growing precarity of its labor force have spurred unionization efforts among tech workers. These workers turn to collective action to improve their working conditions and to protest unethical practices within their workplaces. To better understand this movement, we interviewed 44 U.S.-based tech worker-organizers to examine their motivations, strategies, challenges, and future visions for labor organizing. These workers included engineers, product managers, customer support specialists, QA analysts, logistics workers, gig workers, and union staff organizers. Our findings reveal that, contrary to popular narratives of prestige and privilege within the tech industry, tech workers face fragmented and unstable work environments which contribute to their disempowerment and hinder their organizing efforts. Despite these difficulties, organizers are laying the groundwork for a more resilient tech worker movement through community building and expanding political consciousness. By situating these dynamics within broader structural and ideological forces, we identify ways for the CSCW community to build solidarity with tech workers who are materially transforming our field through their organizing efforts.
Authors:Ryosuke Kohita, Akira Kasuga
Abstract:
Cloud architecture design presents significant challenges due to the necessity of clarifying ambiguous requirements and systematically addressing complex trade-offs, especially for novice engineers with limited cloud experience. While recent advances in the use of AI tools have broadened available options, system-driven approaches that offer explicit guidance and step-by-step information management may be especially effective in supporting novices during the design process. This study qualitatively examines the experiences of 60 novice engineers using such a system-driven cloud design support tool. The findings indicate that structured and proactive system guidance helps novices engage more effectively in architectural design, especially when addressing tasks where knowledge and experience gaps are most critical. For example, participants found it easier to create initial architectures and did not need to craft prompts themselves. In addition, participants reported that the ability to simulate and compare multiple architecture options enabled them to deepen their understanding of cloud design principles and trade-offs, demonstrating the educational value of system-driven support. The study also identifies areas for improvement, including more adaptive information delivery tailored to user expertise, mechanisms for validating system outputs, and better integration with implementation workflows such as infrastructure-as-code generation and deployment guidance. Addressing these aspects can further enhance the educational and practical value of system-driven support tools for cloud architecture design.
Authors:Alexander Komar, Marc-André Heidelmann, Kristina Schaaff
Abstract:
Generative artificial intelligence (GenAI) is transforming education, redefining the role of trainers and coaches in learning environments. In our study, we explore how AI integrates into the design process of learning materials, assessing its impact on efficiency, pedagogical quality, and the evolving role of human trainers and coaches. Through qualitative interviews with professionals in education and corporate training, we identify the following key topics: trainers and coaches increasingly act as facilitators and content moderators rather than primary creators, efficiency gains allow for a stronger strategic focus but at the same time the new tools require new skills. Additionally, we analyze how the anthropomorphism of AI shapes user trust and expectations. From these insights, we derive how tools based on GenAI can successfully be implemented for trainers and coaches on an individual, organizational, systemic, and strategic level.
Authors:Benjamin J. Carroll, Jianlong Zhou, Paul F. Burke, Sabine Ammon
Abstract:
As AI systems increasingly permeate everyday life, designers and developers face mounting pressure to balance innovation with ethical design choices. To date, the operationalisation of AI ethics has predominantly depended on frameworks that prescribe which ethical principles should be embedded within AI systems. However, the extent to which users value these principles remains largely unexplored in the existing literature. In a discrete choice experiment conducted in four countries, we quantify user preferences for 11 ethical principles. Our findings indicate that, while users generally prioritise privacy, justice & fairness, and transparency, their preferences exhibit significant variation based on culture and application context. Latent class analysis further revealed four distinct user cohorts, the largest of which is ethically disengaged and defers to regulatory oversight. Our findings offer (1) empirical evidence of uneven user prioritisation of AI ethics principles, (2) actionable guidance for operationalising ethics tailored to culture and context, (3) support for the development of robust regulatory mechanisms, and (4) a foundation for advancing a user-centred approach to AI ethics, motivated independently from abstract moral theory.
Authors:Christian Roth, Rahmin Bender-Salazar, Breanne Pitt
Abstract:
This paper explores how Interactive Digital Narratives (IDNs) can support learners in developing the critical literacies needed to address complex societal challenges, so-called wicked problems, such as climate change, pandemics, and social inequality. While digital technologies offer broad access to narratives and data, they also contribute to misinformation and the oversimplification of interconnected issues. IDNs enable learners to navigate nonlinear, interactive stories, fostering deeper understanding and engagement. We introduce Systemic Learning IDNs: interactive narrative experiences explicitly designed to help learners explore and reflect on complex systems and interdependencies. To guide their creation and use, we propose the CLASS framework, a structured model that integrates systems thinking, design thinking, and storytelling. This transdisciplinary approach supports learners in developing curiosity, critical thinking, and collaborative problem-solving. Focusing on the classroom context, we apply CLASS to two cases, one commercial narrative simulation and one educational prototype, offering a comparative analysis and practical recommendations for future design and implementation. By combining narrative, systems mapping, and participatory design, this paper highlights how IDNs can become powerful tools for transformative, systems-oriented learning in an increasingly complex world.
Authors:Euihyeok Lee, Souneil Park, Jin Yu, Seungchul Lee, Seungwoo Kang
Abstract:
This paper aims to foster social interaction between parents and young adult children living apart via music. Our approach transforms their music-listening moment into an opportunity to listen to the other's favorite songs and enrich interaction in their daily lives. To this end, we explore the current practice and needs of parent-child communication and the experience and perception of music-mediated interaction. Based on the findings, we developed DJ-Fam, a mobile application that enables parents and children to listen to their favorite songs and use them as conversation starters to foster parent-child interaction. From our deployment study with seven families over four weeks in South Korea, we show the potential of DJ-Fam to influence parent-child interaction and their mutual understanding and relationship positively. Specifically, DJ-Fam considerably increases the frequency of communication and diversifies the communication channels and topics, all of which are satisfactory to the participants.
Authors:Agnes Axelsson, Merle Reimann, Ronald Cumbal, Hannah Pelikan, Divesh Lala
Abstract:
Although the quality of human-robot interactions has improved with the advent of LLMs, there are still various factors that cause systems to be sub-optimal when compared to human-human interactions. The nature and criticality of failures are often dependent on the context of the interaction and so cannot be generalized across the wide range of scenarios and experiments which have been implemented in HRI research. In this work we propose the use of a technique overlooked in the field of HRI, ethnographic vignettes, to clearly highlight these failures, particularly those that are rarely documented. We describe the methodology behind the process of writing vignettes and create our own based on our personal experiences with failures in HRI systems. We emphasize the strength of vignettes as the ability to communicate failures from a multi-disciplinary perspective, promote transparency about the capabilities of robots, and document unexpected behaviours which would otherwise be omitted from research reports. We encourage the use of vignettes to augment existing interaction evaluation methods.
Authors:Yujie Zhao, Jiabei Zeng, Shiguang Shan
Abstract:
Although appearance-based point-of-gaze (PoG) estimation has improved, the estimators still struggle to generalize across individuals due to personal differences. Therefore, person-specific calibration is required for accurate PoG estimation. However, calibrated PoG estimators are often sensitive to head pose variations. To address this, we investigate the key factors influencing calibrated estimators and explore pose-robust calibration strategies. Specifically, we first construct a benchmark, MobilePoG, which includes facial images from 32 individuals focusing on designated points under either fixed or continuously changing head poses. Using this benchmark, we systematically analyze how the diversity of calibration points and head poses influences estimation accuracy. Our experiments show that introducing a wider range of head poses during calibration improves the estimator's ability to handle pose variation. Building on this insight, we propose a dynamic calibration strategy in which users fixate on calibration points while moving their phones. This strategy naturally introduces head pose variation during a user-friendly and efficient calibration process, ultimately producing a better calibrated PoG estimator that is less sensitive to head pose variations than those using conventional calibration strategies. Codes and datasets are available at our project page.
Authors:Yifan Song, Wing Yee Au, Hon Yung Wong, Brian P. Bailey, Tal August
Abstract:
Effective interdisciplinary communication is frequently hindered by domain-specific jargon. To explore the jargon barriers in-depth, we conducted a formative diary study with 16 professionals, revealing critical limitations in current jargon-management strategies during workplace meetings. Based on these insights, we designed ParseJargon, an interactive LLM-powered system providing real-time personalized jargon identification and explanations tailored to users' individual backgrounds. A controlled experiment comparing ParseJargon against baseline (no support) and general-purpose (non-personalized) conditions demonstrated that personalized jargon support significantly enhanced participants' comprehension, engagement, and appreciation of colleagues' work, whereas general-purpose support negatively affected engagement. A follow-up field study validated ParseJargon's usability and practical value in real-time meetings, highlighting both opportunities and limitations for real-world deployment. Our findings contribute insights into designing personalized jargon support tools, with implications for broader interdisciplinary and educational applications.
Authors:Andrés Carvallo, Denis Parra, Peter Brusilovsky, Hernan Valdivieso, Gabriel Rada, Ivania Donoso, Vladimir Araujo
Abstract:
The attention mechanism is a core component of the Transformer architecture. Beyond improving performance, attention has been proposed as a mechanism for explainability via attention weights, which are associated with input features (e.g., tokens in a document). In this context, larger attention weights may imply more relevant features for the model's prediction. In evidence-based medicine, such explanations could support physicians' understanding and interaction with AI systems used to categorize biomedical literature. However, there is still no consensus on whether attention weights provide helpful explanations. Moreover, little research has explored how visualizing attention affects its usefulness as an explanation aid. To bridge this gap, we conducted a user study to evaluate whether attention-based explanations support users in biomedical document classification and whether there is a preferred way to visualize them. The study involved medical experts from various disciplines who classified articles based on study design (e.g., systematic reviews, broad synthesis, randomized and non-randomized trials). Our findings show that the Transformer model (XLNet) classified documents accurately; however, the attention weights were not perceived as particularly helpful for explaining the predictions. However, this perception varied significantly depending on how attention was visualized. Contrary to Munzner's principle of visual effectiveness, which favors precise encodings like bar length, users preferred more intuitive formats, such as text brightness or background color. While our results do not confirm the overall utility of attention weights for explanation, they suggest that their perceived helpfulness is influenced by how they are visually presented.
Authors:Wilder Baldwin, Sepideh Ghanavati, Manuel Woersdoerfer
Abstract:
Recent advances in AI applications have raised growing concerns about the need for ethical guidelines and regulations to mitigate the risks posed by these technologies. In this paper, we present a mixed-method survey study - combining statistical and qualitative analyses - to examine the ethical perceptions, practices, and knowledge of individuals involved in various AI development roles. Our survey includes 414 participants from 43 countries, representing roles such as AI managers, analysts, developers, quality assurance professionals, and information security and privacy experts. The results reveal varying degrees of familiarity and experience with AI ethics principles, government initiatives, and risk mitigation strategies across roles, regions, and other demographic factors. Our findings highlight the importance of a collaborative, role-sensitive approach, involving diverse stakeholders in ethical decision-making throughout the AI development lifecycle. We advocate for developing tailored, inclusive solutions to address ethical challenges in AI development, and we propose future research directions and educational strategies to promote ethics-aware AI practices.
Authors:Yanbing Chen, Jonathan Nelson, Bing Zhou, Ryan Zhenqi Zhou, Shan Ye, Haokun Liu, Zhining Gu, Armita Kar, Hoeyun Kwon, Pengyu Chen, Maoran Sun, Yuhao Kang
Abstract:
Academia is profoundly influenced by faculty hiring networks, which serve as critical conduits for knowledge dissemination and the formation of collaborative research initiatives. While extensive research in various disciplines has revealed the institutional hierarchies inherent in these networks, their impacts within GIScience remain underexplored. To fill this gap, this study analyzes the placement patterns of 946 GIScience faculty worldwide by mapping the connections between PhD-granting institutions and current faculty affiliations. Our dataset, which is compiled from volunteer-contributed information, is the most comprehensive collection available in this field. While there may be some limitations in its representativeness, its scope and depth provide a unique and valuable perspective on the global placement patterns of GIScience faculty. Our analysis reveals several influential programs in placing GIScience faculty, with hiring concentrated in the western countries. We examined the diversity index to assess the representation of regions and institutions within the global GIScience faculty network. We observe significant internal retention at both the continental and country levels, and a high level of non-self-hired ratio at the institutional level. Over time, research themes have also evolved, with growing research clusters emphasis on spatial data analytics, cartography and geovisualization, geocomputation, and environmental sciences, etc. These results illuminate the influence of hiring practices on global knowledge dissemination and contribute to promoting academic equity within GIScience and Geography.
Authors:Yuhao Kang, Chenglong Wang
Abstract:
Generative artificial intelligence (GenAI), including large language models, diffusion-based image generation models, and GenAI agents, has provided new opportunities for advancements in mapping and cartography. Due to their characteristics including world knowledge and generalizability, artistic style and creativity, and multimodal integration, we envision that GenAI may benefit a variety of cartographic design decisions, from mapmaking (e.g., conceptualization, data preparation, map design, and map evaluation) to map use (such as map reading, interpretation, and analysis). This paper discusses several important topics regarding why and how GenAI benefits cartography with case studies including symbolization, map evaluation, and map reading. Despite its unprecedented potential, we identify key scenarios where GenAI may not be suitable, such as tasks that require a deep understanding of cartographic knowledge or prioritize precision and reliability. We also emphasize the need to consider ethical and social implications, such as concerns related to hallucination, reproducibility, bias, copyright, and explainability. This work lays the foundation for further exploration and provides a roadmap for future research at the intersection of GenAI and cartography.
Authors:Shaun Macdonald, Salma ElSayed, Mark McGill
Abstract:
Zoomorphic robots could serve as accessible and practical alternatives for users unable or unwilling to keep pets. However, their affective interactions are often simplistic and short-lived, limiting their potential for domestic adoption. In order to facilitate more dynamic and nuanced affective interactions and relationships between users and zoomorphic robots we present AZRA, a novel augmented reality (AR) framework that extends the affective capabilities of these robots without physical modifications. To demonstrate AZRA, we augment a zoomorphic robot, Petit Qoobo, with novel emotional displays (face, light, sound, thought bubbles) and interaction modalities (voice, touch, proximity, gaze). Additionally, AZRA features a computational model of emotion to calculate the robot's emotional responses, daily moods, evolving personality and needs. We highlight how AZRA can be used for rapid participatory prototyping and enhancing existing robots, then discuss implications on future zoomorphic robot development.
Authors:Sarah Jabbour, David Fouhey, Nikola Banovic, Stephanie D. Shepard, Ella Kazerooni, Michael W. Sjoding, Jenna Wiens
Abstract:
AI has the potential to augment human decision making. However, even high-performing models can produce inaccurate predictions when deployed. These inaccuracies, combined with automation bias, where humans overrely on AI predictions, can result in worse decisions. Selective prediction, in which potentially unreliable model predictions are hidden from users, has been proposed as a solution. This approach assumes that when AI abstains and informs the user so, humans make decisions as they would without AI involvement. To test this assumption, we study the effects of selective prediction on human decisions in a clinical context. We conducted a user study of 259 clinicians tasked with diagnosing and treating hospitalized patients. We compared their baseline performance without any AI involvement to their AI-assisted accuracy with and without selective prediction. Our findings indicate that selective prediction mitigates the negative effects of inaccurate AI in terms of decision accuracy. Compared to no AI assistance, clinician accuracy declined when shown inaccurate AI predictions (66% [95% CI: 56%-75%] vs. 56% [95% CI: 46%-66%]), but recovered under selective prediction (64% [95% CI: 54%-73%]). However, while selective prediction nearly maintains overall accuracy, our results suggest that it alters patterns of mistakes: when informed the AI abstains, clinicians underdiagnose (18% increase in missed diagnoses) and undertreat (35% increase in missed treatments) compared to no AI input at all. Our findings underscore the importance of empirically validating assumptions about how humans engage with AI within human-AI systems.
Authors:Kenneth Ge, Ryan Paul, Priscilla Zhang, JooYoung Seo
Abstract:
Writing mathematical notation requires substantial effort, diverting cognitive resources from conceptual understanding to documentation mechanics, significantly impacting individuals with fine motor disabilities (FMDs). Current limits of speech-based math technologies rely on precise dictation of math symbols and unintuitive command-based interfaces. We present a novel voice-powered math workspace, applying neuroscience insights to create an intuitive problem-solving environment. To minimize cognitive load, we leverage large language models with our novel context engine to support natural language interaction. Ultimately, we enable fluid mathematical engagement for individuals with FMDs -- freed from mechanical constraints.
Authors:Abasi-amefon Obot Affia-Jomants, Alexander Serebrenik, James D. Herbsleb, Alexander Nolte
Abstract:
Hybrid hackathons, which combine in-person and online participation, present unique challenges for organizers and participants. Although such events are increasingly conducted, research on them remains fragmented, with limited integration between hackathon studies and hybrid collaboration. Existing strategies for in-person or online-only events often fail to address the unique challenges of hybrid formats, such as managing communication across physical and virtual spaces. Our work addresses this gap by examining how hybrid hackathons function, analyzing how organizers structure these events and how participants navigate hybrid-specific challenges. Drawing on established theories of hybrid collaboration, we examine key dimensions - synchronicity, physical distribution, dynamic transitions, and technological infrastructure - that shape collaboration in hybrid events. Through an exploratory case study of three hackathon events, we analyze how these dimensions are implemented and their effects on participant experiences. Our findings reveal differing organizer considerations of the hybrid dimensions in the hackathon design, leading to distinct experiences for participants. Implementation styles - favoring in-person, online, or balanced participation - led to varied participant experiences, affecting access to resources, communication, and team coordination. Organizers in our study also relied on technology to bridge hybrid interactions, but overlooked critical aspects like time-zone management, dynamic transitions, and targeted support for hybrid teams. Additionally, participants in their teams responded to gaps in event scaffolding by adapting collaboration strategies, revealing gaps in organizers' preparedness for hybrid events. Learning from our findings, we offer practical recommendations when organizing hybrid hackathon events and recommendations to participants when attending them.
Authors:Wei Xiang, Chuyue Zhang, Jie Yan
Abstract:
Automated driving in level 3 autonomy has been adopted by multiple companies such as Tesla and BMW, alleviating the burden on drivers while unveiling new complexities. This article focused on the under-explored territory of micro accidents during automated driving, characterized as not fatal but abnormal aberrations such as abrupt deceleration and snake driving. These micro accidents are basic yet pervasive events that might results in more severe accidents. Through collecting a comprehensive dataset of user generated video recording such micro accidents in natural driving scenarios, this article locates key variables pertaining to environments and autonomous agents using machine learning methods. Subsequently, crowdsourcing method provides insights into human risk perceptions and reactions to these micro accidents. This article thus describes features of safety critical scenarios other than crashes and fatal accidents, informing and potentially advancing the design of automated driving systems.
Authors:Ahmed M. Abuzuraiq, Philippe Pasquier
Abstract:
Explainable AI (XAI) in creative contexts can go beyond transparency to support artistic engagement, modifiability, and sustained practice. While curated datasets and training human-scale models can offer artists greater agency and control, large-scale generative models like text-to-image diffusion systems often obscure these possibilities. We suggest that even large models can be treated as creative materials if their internal structure is exposed and manipulable. We propose a craft-based approach to explainability rooted in long-term, hands-on engagement akin to Schön's "reflection-in-action" and demonstrate its application through a model-bending and inspection plugin integrated into the node-based interface of ComfyUI. We demonstrate that by interactively manipulating different parts of a generative model, artists can develop an intuition about how each component influences the output.
Authors:Caroline M. Johnston, Olga Koumoundouros, Angel Hsing-Chi Hwang, Laura Onasch-Vera, Eric Rice, Phebe Vayanos
Abstract:
Artificial intelligence researchers have proposed various data-driven algorithms to improve the processes that match individuals experiencing homelessness to scarce housing resources. It remains unclear whether and how these algorithms are received or adopted by practitioners and what their corresponding consequences are. Through semi-structured interviews with 13 policymakers in homeless services in Los Angeles, we investigate whether such change-makers are open to the idea of integrating AI into the housing resource matching process, identifying where they see potential gains and drawbacks from such a system in issues of efficiency, fairness, and transparency. Our qualitative analysis indicates that, even when aware of various complicating factors, policymakers welcome the idea of an AI matching tool if thoughtfully designed and used in tandem with human decision-makers. Though there is no consensus as to the exact design of such an AI system, insights from policymakers raise open questions and design considerations that can be enlightening for future researchers and practitioners who aim to build responsible algorithmic systems to support decision-making in low-resource scenarios.
Authors:Roberto Balestri, Guglielmo Pescatore
Abstract:
Serialized television narratives present significant analytical challenges due to their complex, temporally distributed storylines that necessitate sophisticated information management. This paper introduces a multi-agent system (MAS) designed to extract and analyze narrative arcs by implementing principles of computational memory architectures. The system conceptualizes narrative understanding through analogues of human memory: Large Language Models (LLMs) provide a form of semantic memory for general narrative patterns, while a vector database stores specific arc progressions as episodic memories. A multi-agent workflow simulates working memory processes to integrate these information types. Tested on the first season of Grey's Anatomy (ABC 2005-), the MAS identifies three arc types: Anthology (self-contained), Soap (relationship-focused), and Genre-Specific. These arcs and their episodic developments are stored in a vector database, facilitating structured analysis and semantic comparison. To bridge automation with critical interpretation, a graphical interface enables human oversight and refinement of the system's narrative memory. While demonstrating strong performance in identifying Anthology Arcs and character entities, the system's reliance on textual paratexts (episode summaries) revealed limitations in discerning overlapping arcs and opaque dynamics, underscoring the challenges in computational memory consolidation versus human holistic understanding. This memory-centric approach highlights the potential of combining AI-driven memory processing with human expertise. Beyond television, it offers promise for serialized written formats where narrative is entirely text-based. Future work will focus on integrating multimodal inputs to enrich episodic memory, refining memory integration mechanisms within the MAS, and expanding testing across diverse genres.
Authors:Weihan Zhang, Jun Tao
Abstract:
Explorative flow visualization allows domain experts to analyze complex flow structures by interactively investigating flow patterns. However, traditional visual interfaces often rely on specialized graphical representations and interactions, which require additional effort to learn and use. Natural language interaction offers a more intuitive alternative, but teaching machines to recognize diverse scientific concepts and extract corresponding structures from flow data poses a significant challenge. In this paper, we introduce an automated framework that aligns flow pattern representations with the semantic space of large language models (LLMs), eliminating the need for manual labeling. Our approach encodes streamline segments using a denoising autoencoder and maps the generated flow pattern representations to LLM embeddings via a projector layer. This alignment empowers semantic matching between textual embeddings and flow representations through an attention mechanism, enabling the extraction of corresponding flow patterns based on textual descriptions. To enhance accessibility, we develop an interactive interface that allows users to query and visualize flow structures using natural language. Through case studies, we demonstrate the effectiveness of our framework in enabling intuitive and intelligent flow exploration.
Authors:Nizi Nazar, Ehsaneddin Asgari
Abstract:
Emotional Intelligence (EI) is a critical yet underexplored dimension in the development of human-aligned LLMs. To address this gap, we introduce a unified, psychologically grounded four-layer taxonomy of EI tailored for large language models (LLMs), encompassing emotional tracking, cause inference, appraisal, and emotionally appropriate response generation. Building on this framework, we present EICAP-Bench, a novel MCQ style multi-turn benchmark designed to evaluate EI capabilities in open-source LLMs across diverse linguistic and cultural contexts. We evaluate six LLMs: LLaMA3 (8B), LLaMA3-Instruct, Gemma (9B), Gemma-Instruct, Qwen2.5 (7B), and Qwen2.5-Instruct on EmoCap-Bench, identifying Qwen2.5-Instruct as the strongest baseline. To assess the potential for enhancing EI capabilities, we fine-tune both Qwen2.5-Base and Qwen2.5-Instruct using LoRA adapters on UltraChat (UC), a large-scale, instruction-tuned dialogue dataset, in both English and Arabic. Our statistical analysis reveals that among the five EI layers, only the Appraisal layer shows significant improvement through UC-based fine-tuning. These findings highlight the limitations of existing pretraining and instruction-tuning paradigms in equipping LLMs with deeper emotional reasoning and underscore the need for targeted data and modeling strategies for comprehensive EI alignment.
Authors:Wei Xiang, Ziyue Lei, Haoyuan Che, Fangyuan Ye, Xueting Wu, Lingyun Sun
Abstract:
Operational skill learning, inherently physical and reliant on hands-on practice and kinesthetic feedback, has yet to be effectively replicated in large language model (LLM)-supported training. Current LLM training assistants primarily generate customized textual feedback, neglecting the crucial kinesthetic modality. This gap derives from the textual and uncertain nature of LLMs, compounded by concerns on user acceptance of LLM driven body control. To bridge this gap and realize the potential of collaborative human-LLM action, this work explores human experience of LLM driven kinesthetic assistance. Specifically, we introduced an "Align-Analyze-Adjust" strategy and developed FlightAxis, a tool that integrates LLM with Electrical Muscle Stimulation (EMS) for flight skill acquisition, a representative operational skill domain. FlightAxis learns flight skills from manuals and guides forearm movements during simulated flight tasks. Our results demonstrate high user acceptance of LLM-mediated body control and significantly reduced task completion times. Crucially, trainees reported that this kinesthetic assistance enhanced their awareness of operation flaws and fostered increased engagement in the training process, rather than relieving perceived load. This work demonstrated the potential of kinesthetic LLM training in operational skill acquisition.
Authors:Siddharth Gangwar, David A. Selby, Sebastian J. Vollmer
Abstract:
Making a good graphic that accurately and efficiently conveys the desired message to the audience is both an art and a science, typically not taught in the data science curriculum. Visualisation makeovers are exercises where the community exchange feedback to improve charts and data visualizations. Can multi-modal large language models (LLMs) emulate this task? Given a plot in the form of an image file, or the code used to generate it, an LLM, primed with a list of visualization best practices, is employed to semi-automatically generate constructive criticism to produce a better plot. Our system is centred around prompt engineering of a pre-trained model, relying on a combination of userspecified guidelines and any latent knowledge of data visualization practices that might lie within an LLMs training corpus. Unlike other works, the focus is not on generating valid visualization scripts from raw data or prompts, but on educating the user how to improve their existing data visualizations according to an interpretation of best practices. A quantitative evaluation is performed to measure the sensitivity of the LLM agent to various plotting issues across different chart types. We make the tool available as a simple self-hosted applet with an accessible Web interface.
Authors:Fenya Wasserroth, Eleftherios Avramidis, Vera Czehmann, Tanja Kojic, Fabrizio Nunnari, Sebastian Möller
Abstract:
This paper presents an investigation into the impact of adding adjustment features to an existing sign language (SL) avatar on a Microsoft Hololens 2 device. Through a detailed analysis of interactions of expert German Sign Language (DGS) users with both adjustable and non-adjustable avatars in a specific use case, this study identifies the key factors influencing the comprehensibility, the user experience (UX), and the acceptability of such a system. Despite user preference for adjustable settings, no significant improvements in UX or comprehensibility were observed, which remained at low levels, amid missing SL elements (mouthings and facial expressions) and implementation issues (indistinct hand shapes, lack of feedback and menu positioning). Hedonic quality was rated higher than pragmatic quality, indicating that users found the system more emotionally or aesthetically pleasing than functionally useful. Stress levels were higher for the adjustable avatar, reflecting lower performance, greater effort and more frustration. Additionally, concerns were raised about whether the Hololens adjustment gestures are intuitive and easy to familiarise oneself with. While acceptability of the concept of adjustability was generally positive, it was strongly dependent on usability and animation quality. This study highlights that personalisation alone is insufficient, and that SL avatars must be comprehensible by default. Key recommendations include enhancing mouthing and facial animation, improving interaction interfaces, and applying participatory design.
Authors:Wei Xiang, Muchen Li, Jie Yan, Manling Zheng, Hanfei Zhu, Mengyun Jiang, Lingyun Sun
Abstract:
Level 3 automated driving systems allows drivers to engage in secondary tasks while diminishing their perception of risk. In the event of an emergency necessitating driver intervention, the system will alert the driver with a limited window for reaction and imposing a substantial cognitive burden. To address this challenge, this study employs a Large Language Model (LLM) to assist drivers in maintaining an appropriate attention on road conditions through a "humanized" persuasive advice. Our tool leverages the road conditions encountered by Level 3 systems as triggers, proactively steering driver behavior via both visual and auditory routes. Empirical study indicates that our tool is effective in sustaining driver attention with reduced cognitive load and coordinating secondary tasks with takeover behavior. Our work provides insights into the potential of using LLMs to support drivers during multi-task automated driving.
Authors:Amit Kumar Das, Mohammad Tarun, Klaus Mueller
Abstract:
This paper evaluates the visualization literacy of modern Large Language Models (LLMs) and introduces a novel prompting technique called Charts-of-Thought. We tested three state-of-the-art LLMs (Claude-3.7-sonnet, GPT-4.5 preview, and Gemini-2.0-pro) on the Visualization Literacy Assessment Test (VLAT) using standard prompts and our structured approach. The Charts-of-Thought method guides LLMs through a systematic data extraction, verification, and analysis process before answering visualization questions. Our results show Claude-3.7-sonnet achieved a score of 50.17 using this method, far exceeding the human baseline of 28.82. This approach improved performance across all models, with score increases of 21.8% for GPT-4.5, 9.4% for Gemini-2.0, and 13.5% for Claude-3.7 compared to standard prompting. The performance gains were consistent across original and modified VLAT charts, with Claude correctly answering 100% of questions for several chart types that previously challenged LLMs. Our study reveals that modern multimodal LLMs can surpass human performance on visualization literacy tasks when given the proper analytical framework. These findings establish a new benchmark for LLM visualization literacy and demonstrate the importance of structured prompting strategies for complex visual interpretation tasks. Beyond improving LLM visualization literacy, Charts-of-Thought could also enhance the accessibility of visualizations, potentially benefiting individuals with visual impairments or lower visualization literacy.
Authors:Yang Liu, Thorbjørn Mikkelsen, Zehai Liu, Gengchen Tian, Diako Mardanbegi, Qiushi Zhou, Hans Gellersen, Ken Pfeuffer
Abstract:
In 3D user interfaces, reaching out to grab and manipulate something works great until it is out of reach. Indirect techniques like gaze and pinch offer an alternative for distant interaction, but do not provide the same immediacy or proprioceptive feedback as direct gestures. To support direct gestures for faraway objects, we introduce SightWarp, an interaction technique that exploits eye-hand coordination to seamlessly summon object proxies to the user's fingertips. The idea is that after looking at a distant object, users either shift their gaze to the hand or move their hand into view-triggering the creation of a scaled near-space proxy of the object and its surrounding context. The proxy remains active until the eye-hand pattern is released. The key benefit is that users always have an option to immediately operate on the distant object through a natural, direct hand gesture. Through a user study of a 3D object docking task, we show that users can easily employ SightWarp, and that subsequent direct manipulation improves performance over gaze and pinch. Application examples illustrate its utility for 6DOF manipulation, overview-and-detail navigation, and world-in-miniature interaction. Our work contributes to expressive and flexible object interactions across near and far spaces.
Authors:Amit Kumar Das, Klaus Mueller
Abstract:
Misleading visualizations pose a significant challenge to accurate data interpretation. While recent research has explored the use of Large Language Models (LLMs) for detecting such misinformation, practical tools that also support explanation and correction remain limited. We present MisVisFix, an interactive dashboard that leverages both Claude and GPT models to support the full workflow of detecting, explaining, and correcting misleading visualizations. MisVisFix correctly identifies 96% of visualization issues and addresses all 74 known visualization misinformation types, classifying them as major, minor, or potential concerns. It provides detailed explanations, actionable suggestions, and automatically generates corrected charts. An interactive chat interface allows users to ask about specific chart elements or request modifications. The dashboard adapts to newly emerging misinformation strategies through targeted user interactions. User studies with visualization experts and developers of fact-checking tools show that MisVisFix accurately identifies issues and offers useful suggestions for improvement. By transforming LLM-based detection into an accessible, interactive platform, MisVisFix advances visualization literacy and supports more trustworthy data communication.
Authors:Sayef Azad Sakin, Katherine E. Isaacs
Abstract:
Parallel event sequences, such as those collected in program execution traces and automated manufacturing pipelines, are typically visualized as interactive parallel timelines. As the dataset size grows, these charts frequently experience lag during common interactions such as zooming, panning, and filtering. Summarization approaches can improve interaction performance, but at the cost of accuracy in representation. To address this challenge, we introduce ESeMan (Event Sequence Manager), an event sequence management system designed to support interactive rendering of timeline visualizations with tunable accuracy. ESeMan employs hierarchical data structures and intelligent caching to provide visualizations with only the data necessary to generate accurate summarizations with significantly reduced data fetch time. We evaluate ESeMan's query times against summed area tables, M4 aggregation, and statistical sub-sampling on a variety of program execution traces. Our results demonstrate ESeMan provides better performance, achieving sub-100ms fetch times while maintaining visualization accuracy at the pixel level. We further present our benchmarking harness, enabling future performance evaluations for event sequence visualization.
Authors:Se Won Oh, Hyuntae Jeong, Seungeun Chung, Jeong Mook Lim, Kyoung Ju Noh, Sunkyung Lee, Gyuwon Jung
Abstract:
Improving human health and well-being requires an accurate and effective understanding of an individual's physical and mental state throughout daily life. To support this goal, we utilized smartphones, smartwatches, and sleep sensors to collect data passively and continuously for 24 hours a day, with minimal interference to participants' usual behavior, enabling us to gather quantitative data on daily behaviors and sleep activities across multiple days. Additionally, we gathered subjective self-reports of participants' fatigue, stress, and sleep quality through surveys conducted immediately before and after sleep. This comprehensive lifelog dataset is expected to provide a foundational resource for exploring meaningful insights into human daily life and lifestyle patterns, and a portion of the data has been anonymized and made publicly available for further research. In this paper, we introduce the ETRI Lifelog Dataset 2024, detailing its structure and presenting potential applications, such as using machine learning models to predict sleep quality and stress.
Authors:Zhuoqun Jiang, ShunYi Yeo, Wei Xuan Donovan Seow, Simon Perrault
Abstract:
Mutual reminiscence, defined as revisiting shared positive memories through reciprocal self-disclosure, strengthens emotional bonds, enhances well-being, and deepens intimacy. However, most technology-mediated reminiscence tools emphasize individual reflection or one-way storytelling, which overlooks the dynamic, interactive dialogue essential for meaningful mutual reminiscence. To address this limitation, we introduce Remini, a chatbot designed to support reciprocal self-disclosure between close partners such as couples, friends, or family members. Grounded in the Social Functions of Autobiographical Memory (SFAM) framework, Remini uses conversational AI to guide emotionally rich exchanges through five narrative phases: rapport building, memory narration, elaboration, reflection, and summary. In a mixed-method, both between- and within- subjects study (N = 48, 24 dyads), we compare Remini to a baseline chatbot that offers minimal memory-trigger prompts. Our findings show that structured guidance from Remini significantly improves positive affect, feeling of connection, and engagement. It also fosters more detailed narrative co-construction and greater reciprocal self-disclosure. Participant feedback highlights the practical value, perceived benefits, and design considerations of chatbot-mediated reminiscence. We contribute empirically grounded design implications for conversational agents that strengthen human connection through mutual reminiscence.
Authors:Ekai Hashimoto, Takeshi Mizumoto, Kohei Nagira, Shun Shiramatsu
Abstract:
In recent years, many companies have recognized the importance of human resources and are investing in human capital to revitalize their organizations and enhance internal communication, thereby fostering innovation. However, conventional quantification methods have mainly focused on readily measurable indicators without addressing the fundamental role of conversations in human capital. This study focuses on routine meetings and proposes strategies to visualize human capital by analyzing speech amount during these meetings. We employ conversation visualization technology, which operates effectively, to quantify speech. We then measure differences in speech amount by attributes such as gender and job post, changes in speech amount depending on whether certain participants are present, and correlations between speech amount and continuous attributes. To verify the effectiveness of our proposed methods, we analyzed speech amounts by departmental affiliation during weekly meetings at small to medium enterprises.
Authors:Mansi Sharma, Camilo Andrés MartÃnez MartÃnez, Benedikt Emanuel Wirth, Antonio Krüger, Philipp Müller
Abstract:
Distinguishing target from non-target fixations during visual search is a fundamental building block to understand users' intended actions and to build effective assistance systems. While prior research indicated the feasibility of classifying target vs. non-target fixations based on eye tracking and electroencephalography (EEG) data, these studies were conducted with explicitly instructed search trajectories, abstract visual stimuli, and disregarded any scene context. This is in stark contrast with the fact that human visual search is largely driven by scene characteristics and raises questions regarding generalizability to more realistic scenarios. To close this gap, we, for the first time, investigate the classification of target vs. non-target fixations during free visual search in realistic scenes. In particular, we conducted a 36-participants user study using a large variety of 140 realistic visual search scenes in two highly relevant application scenarios: searching for icons on desktop backgrounds and finding tools in a cluttered workshop. Our approach based on gaze and EEG features outperforms the previous state-of-the-art approach based on a combination of fixation duration and saccade-related potentials. We perform extensive evaluations to assess the generalizability of our approach across scene types. Our approach significantly advances the ability to distinguish between target and non-target fixations in realistic scenarios, achieving 83.6% accuracy in cross-user evaluations. This substantially outperforms previous methods based on saccade-related potentials, which reached only 56.9% accuracy.
Authors:Mark Steyvers, Lukas Mayer
Abstract:
AI systems and technologies that can interact with humans in real time face a communication dilemma: when to offer assistance and how frequently. Overly frequent or contextually redundant assistance can cause users to disengage, undermining the long-term benefits of AI assistance. We introduce a cognitive modeling framework based on Partially Observable Markov Decision Processes (POMDPs) that addresses this timing challenge by inferring a user's latent cognitive state related to AI engagement over time. Additionally, our framework incorporates reasoning about the long-term effects of AI assistance, explicitly aiming to avoid actions that could lead the human user to disengage or deactivate the AI. A key component of our approach is counterfactual reasoning: at each time step, the AI considers how well the user would perform independently and weighs the potential boost in performance against the risk of diminishing engagement with the AI. Through simulations, we show that this adaptive strategy significantly outperforms baseline policies in which assistance is always provided or never provided. Our results highlight the importance of balancing short-term decision accuracy with sustained user engagement, showing how communication strategies can be optimized to avoid alert fatigue while preserving the user's receptiveness to AI guidance.
Authors:Tyrone Justin Sta Maria, Faith Griffin, Jordan Aiko Deja
Abstract:
Gestures are an expressive input modality for controlling multiple robots, but their use is often limited by rigid mappings and recognition constraints. To move beyond these limitations, we propose roleplaying metaphors as a scaffold for designing richer interactions. By introducing three roles: Director, Puppeteer, and Wizard, we demonstrate how narrative framing can guide the creation of diverse gesture sets and interaction styles. These roles enable a variety of scenarios, showing how roleplay can unlock new possibilities for multi-robot systems. Our approach emphasizes creativity, expressiveness, and intuitiveness as key elements for future human-robot interaction design.
Authors:Yongsu Ahn, Nam Wook Kim
Abstract:
This paper investigates why recent generative AI models outperform humans in data visualization knowledge tasks. Through systematic comparative analysis of responses to visualization questions, we find that differences exist between two ChatGPT models and human outputs over rhetorical structure, knowledge breadth, and perceptual quality. Our findings reveal that ChatGPT-4, as a more advanced model, displays a hybrid of characteristics from both humans and ChatGPT-3.5. The two models were generally favored over human responses, while their strengths in coverage and breadth, and emphasis on technical and task-oriented visualization feedback collectively shaped higher overall quality. Based on our findings, we draw implications for advancing user experiences based on the potential of LLMs and human perception over their capabilities, with relevance to broader applications of AI.
Authors:Chaojia Yu, Zihan Cheng, Hanwen Cui, Yishuo Gao, Zexu Luo, Yijin Wang, Hangbin Zheng, Yong Zhao
Abstract:
In the age of large language models (LLMs), autonomous agents have emerged as a powerful paradigm for achieving general intelligence. These agents dynamically leverage tools, memory, and reasoning capabilities to accomplish user-defined goals. As agent systems grow in complexity, agent workflows-structured orchestration frameworks-have become central to enabling scalable, controllable, and secure AI behaviors. This survey provides a comprehensive review of agent workflow systems, spanning academic frameworks and industrial implementations. We classify existing systems along two key dimensions: functional capabilities (e.g., planning, multi-agent collaboration, external API integration) and architectural features (e.g., agent roles, orchestration flows, specification languages). By comparing over 20 representative systems, we highlight common patterns, potential technical challenges, and emerging trends. We further address concerns related to workflow optimization strategies and security. Finally, we outline open problems such as standardization and multimodal integration, offering insights for future research at the intersection of agent design, workflow infrastructure, and safe automation.
Authors:Shumeng Zhang, Raul Masu, Mela Bettega, Mingming Fan
Abstract:
This paper presents a systematic literature review of music technology tailored for blind and low vision (BLV) individuals. Music activities can be particularly beneficial for BLV people. However, a systematic approach to organizing knowledge on designing accessible technology for BLV people has yet to be attempted. We categorize the existing studies based on the type of technology and the extent of BLV people's involvement in the research. We identify six main categories of BLV people-oriented music technology and highlight four key trends in design goals. Based on these categories, we propose four general insights focusing on (1) spatial awareness, (2) access to information, (3) (non-verbal) communication, and (4) memory. The identified trends suggest that more empirical studies involving BLV people in real-world scenarios are needed to ensure that technological advancements can enhance musical experiences and social inclusion. This research proposes collaborative music technology and inclusive real-world testing with the target group as two key areas missing in current research. They serve as a foundational step in shifting the focus from ``accessible technology'' to ``inclusive technology'' for BLV individuals within the broader field of accessibility research.
Authors:Abeer Dyoub, Ivan Letteri, Francesca A. Lisi
Abstract:
The emergence of Symbiotic AI (SAI) introduces new challenges to ethical decision-making as it deepens human-AI collaboration. As symbiosis grows, AI systems pose greater ethical risks, including harm to human rights and trust. Ethical Risk Assessment (ERA) thus becomes crucial for guiding decisions that minimize such risks. However, ERA is hindered by uncertainty, vagueness, and incomplete information, and morality itself is context-dependent and imprecise. This motivates the need for a flexible, transparent, yet robust framework for ERA. Our work supports ethical decision-making by quantitatively assessing and prioritizing multiple ethical risks so that artificial agents can select actions aligned with human values and acceptable risk levels. We introduce ff4ERA, a fuzzy framework that integrates Fuzzy Logic, the Fuzzy Analytic Hierarchy Process (FAHP), and Certainty Factors (CF) to quantify ethical risks via an Ethical Risk Score (ERS) for each risk type. The final ERS combines the FAHP-derived weight, propagated CF, and risk level. The framework offers a robust mathematical approach for collaborative ERA modeling and systematic, step-by-step analysis. A case study confirms that ff4ERA yields context-sensitive, ethically meaningful risk scores reflecting both expert input and sensor-based evidence. Risk scores vary consistently with relevant factors while remaining robust to unrelated inputs. Local sensitivity analysis shows predictable, mostly monotonic behavior across perturbations, and global Sobol analysis highlights the dominant influence of expert-defined weights and certainty factors, validating the model design. Overall, the results demonstrate ff4ERA ability to produce interpretable, traceable, and risk-aware ethical assessments, enabling what-if analyses and guiding designers in calibrating membership functions and expert judgments for reliable ethical decision support.
Authors:Sumit Kumar, Sarthak Kapoor, Harsh Vardhan, Yao Zhao
Abstract:
Large Language Models (LLMs) are revolutionizing industries by enhancing efficiency, scalability, and innovation. This paper investigates the potential of LLMs in automating Computer-Aided Design (CAD) workflows, by integrating FreeCAD with LLM as CAD design tool. Traditional CAD processes are often complex and require specialized sketching skills, posing challenges for rapid prototyping and generative design. We propose a framework where LLMs generate initial CAD scripts from natural language descriptions, which are then executed and refined iteratively based on error feedback. Through a series of experiments with increasing complexity, we assess the effectiveness of this approach. Our findings reveal that LLMs perform well for simple to moderately complex designs but struggle with highly constrained models, necessitating multiple refinements. The study highlights the need for improved memory retrieval, adaptive prompt engineering, and hybrid AI techniques to enhance script robustness. Future directions include integrating cloud-based execution and exploring advanced LLM capabilities to further streamline CAD automation. This work underscores the transformative potential of LLMs in design workflows while identifying critical areas for future development.
Authors:Andrew Blair, Peggy Gregory, Mary Ellen Foster
Abstract:
Recent technological advances have allowed robots to assist in the service sector, and consequently accelerate job and sector transformation. Less attention has been paid to the use of robots in real-world organisations where social benefits, as opposed to profits, are the primary motivator. To explore these opportunities, we have partnered with a working church and visitor attraction. We conducted interviews with 15 participants from a range of stakeholder groups within the church to understand worker perspectives of introducing a social robot to the church and analysed the results using reflexive thematic analysis. Findings indicate mixed responses to the use of a robot, with participants highlighting the empathetic responsibility the church has towards people and the potential for unintended consequences. However, information provision and alleviation of menial or mundane tasks were identified as potential use cases. This highlights the need to consider not only the financial aspects of robot introduction, but also how social and intangible values shape what roles a robot should take on within an organisation.
Authors:Ye Sun, Bowei Zhao, Dezhong Yao, Rui Zhang, Bohan Zhang, Xiaoyuan Li, Jing Wang, Mingxuan Qu, Gang Liu
Abstract:
Brain-computer interfaces (BCIs) enable real-time interaction between the brain and external devices by decoding neural signals. However, existing motor-based BCI paradigms, like motor imagery BCI, face challenges with imprecise labeling in real-world use. This mismatch between EEG signals and true behavioral intentions leads to pseudo-labels, undermining decoding accuracy and system robustness. To overcome this bottleneck, this paper first proposes a novel motor intention extraction framework based on a non-invasive brain-muscle interface (BMuI)($\text{BCI} = \frac{\text{Brain}}{\text{Computer}} \text{ Interface} = \frac{\text{Brain}}{\not\text{Muscle}}\! \text{ (BMuI)} \times \!\frac{\not\text{Muscle}}{\text{Computer}}\! \text{ Interface}$). This method simulates the neural pathway from the brain to the muscles in order to capture and enhance the weak motor intention signals originating in the brain. It then uses EMG as a high-fidelity relay medium to achieve more accurate intention recognition and transmission. To systematically validate the feasibility and effectiveness of this approach, we conducted both offline experiments (to repeatedly verify feasibility) and online experiments (to construct a real-time interactive system and evaluate its performance). The results show that BMuI is feasible, achieving a prediction accuracy of 0.8314; in the online experiment, all participants are able to successfully control the Unity virtual arm.
Authors:Saadiq Rauf Khan, Vinit Chandak, Sougata Mukherjea
Abstract:
Information Visualization has been utilized to gain insights from complex data. In recent times, Large Language models (LLMs) have performed very well in many tasks. In this paper, we showcase the capabilities of different popular LLMs to generate code for visualization based on simple prompts. We also analyze the power of LLMs to understand some common visualizations by answering questions. Our study shows that LLMs could generate code for some simpler visualizations such as bar and pie charts. Moreover, they could answer simple questions about visualizations. However, LLMs also have several limitations. For example, some of them had difficulty generating complex visualizations, such as violin plot. LLMs also made errors in answering some questions about visualizations, for example, identifying relationships between close boundaries and determining lengths of shapes. We believe that our insights can be used to improve both LLMs and Information Visualization systems.
Authors:Max Sondag, Christofer Meinecke, Dennis Collaris, Tatiana von Landesberger, Stef van den Elzen
Abstract:
Random forests are a machine learning method used to automatically classify datasets and consist of a multitude of decision trees. While these random forests often have higher performance and generalize better than a single decision tree, they are also harder to interpret. This paper presents a visualization method and system to increase interpretability of random forests. We cluster similar trees which enables users to interpret how the model performs in general without needing to analyze each individual decision tree in detail, or interpret an oversimplified summary of the full forest. To meaningfully cluster the decision trees, we introduce a new distance metric that takes into account both the decision rules as well as the predictions of a pair of decision trees. We also propose two new visualization methods that visualize both clustered and individual decision trees: (1) The Feature Plot, which visualizes the topological position of features in the decision trees, and (2) the Rule Plot, which visualizes the decision rules of the decision trees. We demonstrate the efficacy of our approach through a case study on the "Glass" dataset, which is a relatively complex standard machine learning dataset, as well as a small user study.
Authors:Michael Kaiser, Clemens GroÃ, Lisa Marie Otto, Steffen Müller
Abstract:
Testing and evaluating automated driving systems (ADS) in interactions with vulnerable road users (VRUs), such as cyclists, are essential for improving the safety of VRUs, but often lack realism. This paper presents and validates a coupled in-the-loop test environment that integrates a Cyclist-in-the Loop test bench with a Vehicle-in-the-Loop test bench via a virtual environment (VE) developed in Unreal Engine 5. The setup enables closed-loop, bidirectional interaction between a real human cyclist and a real automated vehicle under safe and controllable conditions. The automated vehicle reacts to cyclist gestures via stimulated camera input, while the cyclist, riding a stationary bicycle, perceives and reacts to the vehicle in the VE in real time. Validation experiments are conducted using a real automated shuttle bus with a track-and-follow function, performing three test maneuvers - straight-line driving with stop, circular track driving, and double lane change - on a proving ground and in the coupled in-the-loop test environment. The performance is evaluated by comparing the resulting vehicle trajectories in both environments. Additionally, the introduced latencies of individual components in the test setup are measured. The results demonstrate the feasibility of the approach and highlight its strengths and limitations for realistic ADS evaluation.
Authors:Hannah Kim, Sergei L. Kosakovsky Pond, Stephen MacNeil
Abstract:
This full research paper investigates the impact of generative AI (GenAI) on the learner experience, with a focus on how learners engage with and utilize the information it provides. In e-learning environments, learners often need to navigate a complex information space on their own. This challenge is further compounded in interdisciplinary fields like bioinformatics, due to the varied prior knowledge and backgrounds. In this paper, we studied how GenAI influences information search in bioinformatics research: (1) How do interactions with a GenAI chatbot influence learner orienteering behaviors?; and (2) How do learners identify information scent in GenAI chatbot responses? We adopted an autoethnographic approach to investigate these questions. GenAI was found to support orienteering once a learning plan was established, but it was counterproductive prior to that. Moreover, traditionally value-rich information sources such as bullet points and related terms proved less effective when applied to GenAI responses. Information scents were primarily recognized through the presence or absence of prior knowledge of the domain. These findings suggest that GenAI should be adopted into e-learning environments with caution, particularly in interdisciplinary learning contexts.
Authors:Tingying He, Maggie McCracken, Daniel Hajas, Sarah Creem-Regehr, Alexander Lex
Abstract:
We investigate whether tactile charts support comprehension and learning of complex visualizations for blind and low-vision (BLV) individuals and contribute four tactile chart designs and an interview study. Visualizations are powerful tools for conveying data, yet BLV individuals typically can rely only on assistive technologies -- primarily alternative texts -- to access this information. Prior research shows the importance of mental models of chart types for interpreting these descriptions, yet BLV individuals have no means to build such a mental model based on images of visualizations. Tactile charts show promise to fill this gap in supporting the process of building mental models. Yet studies on tactile data representations mostly focus on simple chart types, and it is unclear whether they are also appropriate for more complex charts as would be found in scientific publications. Working with two BLV researchers, we designed 3D-printed tactile template charts with exploration instructions for four advanced chart types: UpSet plots, violin plots, clustered heatmaps, and faceted line charts. We then conducted an interview study with 12 BLV participants comparing whether using our tactile templates improves mental models and understanding of charts and whether this understanding translates to novel datasets experienced through alt texts. Thematic analysis shows that tactile models support chart type understanding and are the preferred learning method by BLV individuals. We also report participants' opinions on tactile chart design and their role in BLV education.
Authors:Victor Liu, Timothy Du, Jordy Sehn, Jack Collier, François Grondin
Abstract:
This paper presents a sound source localization strategy that relies on a microphone array embedded in an unmanned ground vehicle and an asynchronous close-talking microphone near the operator. A signal coarse alignment strategy is combined with a time-domain acoustic echo cancellation algorithm to estimate a time-frequency ideal ratio mask to isolate the target speech from interferences and environmental noise. This allows selective sound source localization, and provides the robot with the direction of arrival of sound from the active operator, which enables rich interaction in noisy scenarios. Results demonstrate an average angle error of 4 degrees and an accuracy within 5 degrees of 95\% at a signal-to-noise ratio of 1dB, which is significantly superior to the state-of-the-art localization methods.
Authors:Qian Huang, Thijs Willems
Abstract:
As generative AI (Gen-AI) tools become more prevalent in education, there is a growing need to understand how educators, not just students, can actively shape their design and use. This study investigates how two instructors integrated four custom GPT tools into a Masters-level Qualitative Research Methods course for Urban Planning Policy students. Addressing two key gaps: the dominant framing of students as passive AI users, and the limited use of AI in qualitative methods education. The study explores how Gen-AI can support disciplinary learning when aligned with pedagogical intent. Drawing on the Technological Pedagogical Content Knowledge (TPACK) framework and action research methodology, the instructors designed GPTs to scaffold tasks such as research question formulation, interview practice, fieldnote analysis, and design thinking. Thematic analysis of student reflections, AI chat logs, and final assignments revealed that the tools enhanced student reflexivity, improved interview techniques, and supported structured analytic thinking. However, students also expressed concerns about cognitive overload, reduced immersion in data, and the formulaic nature of AI responses. The study offers three key insights: AI can be a powerful scaffold for active learning when paired with human facilitation; custom GPTs can serve as cognitive partners in iterative research practice; and educator-led design is critical to pedagogically meaningful AI integration. This research contributes to emerging scholarship on AI in higher education by demonstrating how empowering educators to design custom tools can promote more reflective, responsible, and collaborative learning with AI.
Authors:Junwen Duan, Wei Xue, Ziyao Kang, Shixia Liu, Jiazhi Xia
Abstract:
Open-world object detection (OWOD) extends traditional object detection to identifying both known and unknown object, necessitating continuous model adaptation as new annotations emerge. Current approaches face significant limitations: 1) data-hungry training due to reliance on a large number of crowdsourced annotations, 2) susceptibility to "partial feature overfitting," and 3) limited flexibility due to required model architecture modifications. To tackle these issues, we present OW-CLIP, a visual analytics system that provides curated data and enables data-efficient OWOD model incremental training. OW-CLIP implements plug-and-play multimodal prompt tuning tailored for OWOD settings and introduces a novel "Crop-Smoothing" technique to mitigate partial feature overfitting. To meet the data requirements for the training methodology, we propose dual-modal data refinement methods that leverage large language models and cross-modal similarity for data generation and filtering. Simultaneously, we develope a visualization interface that enables users to explore and deliver high-quality annotations: including class-specific visual feature phrases and fine-grained differentiated images. Quantitative evaluation demonstrates that OW-CLIP achieves competitive performance at 89% of state-of-the-art performance while requiring only 3.8% self-generated data, while outperforming SOTA approach when trained with equivalent data volumes. A case study shows the effectiveness of the developed method and the improved annotation quality of our visualization system.
Authors:Jeffrey Heer, Dominik Moritz, Ron Pechuk
Abstract:
Though powerful tools for analysis and communication, interactive visualizations often fail to support real-time interaction with large datasets with millions or more records. To highlight and filter data, users indicate values or intervals of interest. Such selections may span multiple components, combine in complex ways, and require optimizations to ensure low-latency updates. We describe Mosaic Selections, a model for representing, managing, and optimizing user selections, in which one or more filter predicates are added to queries that request data for visualizations and input widgets. By analyzing both queries and selection predicates, Mosaic Selections enable automatic optimizations, including pre-aggregating data to rapidly compute selection updates. We contribute a formal description of our selection model and optimization methods, and their implementation in the open-source Mosaic architecture. Benchmark results demonstrate orders-of-magnitude latency improvements for selection-based optimizations over unoptimized queries and existing optimizers for the Vega language. The Mosaic Selection model provides infrastructure for flexible, interoperable filtering across multiple visualizations, alongside automatic optimizations to scale to millions and even billions of records.
Authors:Longfei Chen, Christopher Lochhead, Robert B. Fisher, Nusa Faric, Jacques Fleuriot, Subramanian Ramamoorthy
Abstract:
Beneficial daily activity interventions have been shown to improve both the physical and mental health of older adults. However, there is a lack of robust objective metrics and personalized strategies to measure their impact. In this study, two older adults aged over 65, living in Edinburgh, UK, selected their preferred daily interventions (mindful meals and art crafts), which are then assessed for effectiveness. The total monitoring period across both participants was 8 weeks. Their physical behaviours were continuously monitored using a non-contact, privacy-preserving camera-based system. Postural and mobility statistics were extracted using computer vision algorithms and compared across periods with and without the interventions. The results demonstrate significant behavioural changes for both participants, highlighting the effectiveness of both these activities and the monitoring system.
Authors:Margarita Leib, Nils Köbis, Ivan Soraperra
Abstract:
People increasingly rely on AI-advice when making decisions. At times, such advice can promote selfish behavior. When individuals abide by selfishness-promoting AI advice, how are they perceived and punished? To study this question, we build on theories from social psychology and combine machine-behavior and behavioral economic approaches. In a pre-registered, financially-incentivized experiment, evaluators could punish real decision-makers who (i) received AI, human, or no advice. The advice (ii) encouraged selfish or prosocial behavior, and decision-makers (iii) behaved selfishly or, in a control condition, behaved prosocially. Evaluators further assigned responsibility to decision-makers and their advisors. Results revealed that (i) prosocial behavior was punished very little, whereas selfish behavior was punished much more. Focusing on selfish behavior, (ii) compared to receiving no advice, selfish behavior was penalized more harshly after prosocial advice and more leniently after selfish advice. Lastly, (iii) whereas selfish decision-makers were seen as more responsible when they followed AI compared to human advice, punishment between the two advice sources did not vary. Overall, behavior and advice content shape punishment, whereas the advice source does not.
Authors:Son Quoc Tran, Tushaar Gangavarapu, Nicholas Chernogor, Jonathan P. Chang, Cristian Danescu-Niculescu-Mizil
Abstract:
We often rely on our intuition to anticipate the direction of a conversation. Endowing automated systems with similar foresight can enable them to assist human-human interactions. Recent work on developing models with this predictive capacity has focused on the Conversations Gone Awry (CGA) task: forecasting whether an ongoing conversation will derail. In this work, we revisit this task and introduce the first uniform evaluation framework, creating a benchmark that enables direct and reliable comparisons between different architectures. This allows us to present an up-to-date overview of the current progress in CGA models, in light of recent advancements in language modeling. Our framework also introduces a novel metric that captures a model's ability to revise its forecast as the conversation progresses.
Authors:Jonas Peché, Aliaksei Tsishurou, Alexander Zap, Guenter Wallner
Abstract:
A central area of interest in many competitive online games is spatial behavior which due to its complexity can be difficult to visualize. Such behaviors of interest include not only overall movement patterns but also being able to understand which player or team is exerting control over an area to inform decision-making. Map control can, however, be challenging to quantify. In this paper, we propose a method for calculating frontlines and first efforts towards a visualization of them. The visualization can show map control and frontlines at a specific time point or changes of these over time. For this purpose, it utilizes support vector machines to derive frontlines from unit positions. We illustrate our algorithm and visualization with examples based on the team-based online game World of Tanks.
Authors:Alireza Mortezapour, Andrea Antonio Cantone, Monica Maria Lucia Sebillo, Giuliana Vitiello
Abstract:
In pursuit of documenting users Neurophysiological responses during experiencing virtual environments (VE), this systematic review presents a novel conceptual model of UX in VE. Searching across seven databases yielded to 1743 articles. Rigorous screenings, included only 66 articles. Notably, UX in VE lacks a consensus definition. Obviously, this UX has many unique sub-dimensions that are not mentioned in other products. The presented conceptual model contains 26 subdimensions which mostly not supported in previous subjective tools and questionnaires. While EEG and ECG were common, brain ultrasound, employed in one study, highlights the need for using neurophysiological assessments to comprehensively grasp immersive UX intricacies.
Authors:Oliver Bates, Christian Remy, Kieran Cutting, Adam Tyler, Adrian Friday
Abstract:
What could designing for carbon reduction of heating and cooling in commercial settings look like in the near future? How can we challenge dominant mindsets and paradigms of efficiency and behaviour change? How can we help build worlds through our practice that can become future realities? This paper introduces the fictional consultancy ANCSTRL.LAB to explore opportunities for making space in research projects that can encourage more systems-oriented interventions. We present a design fiction that asks `what if energy management and reduction practice embraced systems thinking?'. Our design fiction explores how future energy consultancies could utilise systems thinking, and (more than) human centred design to re-imagine energy management practice and change systems in ways that are currently unfathomable. We finish by discussing how LIMITS research can utilise design fiction and speculative praxis to help build new material realities where more holistic perspectives, the leveraging of systems change, and the imagining of post-neoliberal futures is the norm.
Authors:Hongyu Zhou, Yihao Dong, Masahiko Inami, Zhanna Sarsenbayeva, Anusha Withana
Abstract:
The application and implementation of collaborative embodiment in virtual reality (VR) are a critical aspect of the computer science landscape, aiming to enhance multi-user interaction and teamwork in immersive environments. A notable and enduring area of collaborative embodiment research focuses on approaches that enable multiple users to share control, interact, and investigate scenarios involving supernumerary arms in virtual spaces. In this survey, we will present an extensive overview of the methodologies employed in the past decade to enable collaboration in VR environments, particularly through embodiment. Using the PRISMA guidelines, we plan to analyze the study details from over 137 relevant research papers. Through this analysis, a critical assessment of the effectiveness of these methodologies will be conducted, highlighting current challenges and limitations in implementing collaborative embodiment in VR. Lastly, we discuss potential future research directions and opportunities for enhancing collaboration embodiment in virtual environments.
Authors:UÄur Ãnal, Sanem Sariel, Metin Sezgin, Ergun Akleman
Abstract:
This article presents a case study comparing the capabilities of humans and artificial intelligence (AI) for visual storytelling. We developed detailed instructions to recreate a three-panel Nancy cartoon strip by Ernie Bushmiller and provided them to both humans and AI systems. The human participants were 20-something students with basic artistic training but no experience or knowledge of this comic strip. The AI systems used were popular commercial models trained to draw and paint like artists, though their training sets may not necessarily include Bushmiller's work. Results showed that AI systems excel at mimicking professional art but struggle to create coherent visual stories. In contrast, humans proved highly adept at transforming instructions into meaningful visual narratives.
Authors:Nisha Devasia, Georgia Kenderova, Michele Newman, Julie Kientz, Jin Ha Lee
Abstract:
While much of the research in digital games has emphasized hedonic experiences, such as flow, enjoyment, and positive affect, recent years have seen increased interest in eudaimonic gaming experiences, typically mixed-affect and associated with personal meaningfulness and growth. The formation of such experiences in games is theorized to have four constituent elements: motivation, game use, experience, and effects. However, while the first three elements have been relatively well explored in the literature, the effects - and how they may influence positive individual outcomes - have been underexplored thus far. To this end, in this work, we investigate the perceived outcomes of eudaimonic gaming and how different components of the experience influence these effects. We conducted a survey (n = 166) in which respondents recounted meaningful gaming experiences and how they affected their present lives. We used a mixed-methods approach to classify effects and identify significant subcomponents of their formation. We contribute an empirical understanding of how meaningful gaming experiences can lead to positive reflective, learning, social, health, and career effects, extending current theoretical models of eudaimonic gaming experiences and offering implications for how researchers and practitioners might use these findings to promote positive outcomes for players.
Authors:Victoria R. Li, Johnathan Sun, Martin Wattenberg
Abstract:
Charts and graphs help people analyze data, but can they also be useful to AI systems? To investigate this question, we perform a series of experiments with two commercial vision-language models: GPT 4.1 and Claude 3.5. Across three representative analysis tasks, the two systems describe synthetic datasets more precisely and accurately when raw data is accompanied by a scatterplot, especially as datasets grow in complexity. Comparison with two baselines -- providing a blank chart and a chart with mismatched data -- shows that the improved performance is due to the content of the charts. Our results are initial evidence that AI systems, like humans, can benefit from visualization.
Authors:Shu-Yuan Liu, Johan Engström, Gustav Markkula
Abstract:
Response timing measures play a crucial role in the assessment of automated driving systems (ADS) in collision avoidance scenarios, including but not limited to establishing human benchmarks and comparing ADS to human driver response performance. For example, measuring the response time (of a human driver or ADS) to a conflict requires the determination of a stimulus onset and a response onset. In existing studies, response onset relies on manual annotation or vehicle control signals such as accelerator and brake pedal movements. These methods are not applicable when analyzing large scale data where vehicle control signals are not available. This holds in particular for the rapidly expanding sets of ADS log data where the behavior of surrounding road users is observed via onboard sensors. To advance evaluation techniques for ADS and enable measuring response timing when vehicle control signals are not available, we developed a simple and efficient algorithm, based on a piecewise linear acceleration model, to automatically estimate brake onset that can be applied to any type of driving data that includes vehicle longitudinal time series data. We also proposed a manual annotation method to identify brake onset and used it as ground truth for validation. R^2 was used as a confidence metric to measure the accuracy of the algorithm, and its classification performance was analyzed using naturalistic collision avoidance data of both ADS and humans, where our method was validated against human manual annotation. Although our algorithm is subject to certain limitations, it is efficient, generalizable, applicable to any road user and scenario types, and is highly configurable.
Authors:Jan-Christoph Kalo, Fina Polat, Shubha Guha, Paul Groth
Abstract:
Modern AI systems are complex workflows containing multiple components and data sources. Data provenance provides the ability to interrogate and potentially explain the outputs of these systems. However, provenance is often too detailed and not contextualized for the user trying to understand the AI system. In this work, we present our vision for an interactive agent that works together with the user to co-construct an explanation that is simultaneously useful to the user as well as grounded in data provenance. To illustrate this vision, we present: 1) an initial prototype of such an agent; and 2) a scalable evaluation framework based on user simulations and a large language model as a judge approach.
Authors:Sam Gordon James, Miranda Elaine Glynis Armstrong, Aisling Ann O'Kane, Harry Emerson, Zahraa S. Abdallah
Abstract:
Background: Type 1 diabetes (T1D) has seen a rapid evolution in management technology and forms a useful case study for the future management of other chronic conditions. Further development of this management technology requires an exploration of its real-world use and the potential of additional data streams. To facilitate this, we contribute the BrisT1D Dataset to the growing number of public T1D management datasets. The dataset was developed from a longitudinal study of 24 young adults in the UK who used a smartwatch alongside their usual T1D management. Findings: The BrisT1D dataset features both device data from the T1D management systems and smartwatches used by participants, as well as transcripts of monthly interviews and focus groups conducted during the study. The device data is provided in a processed state, for usability and more rapid analysis, and in a raw state, for in-depth exploration of novel insights captured in the study. Conclusions: This dataset has a range of potential applications. The quantitative elements can support blood glucose prediction, hypoglycaemia prediction, and closed-loop algorithm development. The qualitative elements enable the exploration of user experiences and opinions, as well as broader mixed-methods research into the role of smartwatches in T1D management.
Authors:Xiaoan Liu, Difan Jia, Xianhao Carton Liu, Mar Gonzalez-Franco, Chen Zhu-Tian
Abstract:
Interacting with real-world objects in Mixed Reality (MR) often proves difficult when they are crowded, distant, or partially occluded, hindering straightforward selection and manipulation. We observe that these difficulties stem from performing interaction directly on physical objects, where input is tightly coupled to their physical constraints. Our key insight is to decouple interaction from these constraints by introducing proxies-abstract representations of real-world objects. We embody this concept in Reality Proxy, a system that seamlessly shifts interaction targets from physical objects to their proxies during selection. Beyond facilitating basic selection, Reality Proxy uses AI to enrich proxies with semantic attributes and hierarchical spatial relationships of their corresponding physical objects, enabling novel and previously cumbersome interactions in MR - such as skimming, attribute-based filtering, navigating nested groups, and complex multi object selections - all without requiring new gestures or menu systems. We demonstrate Reality Proxy's versatility across diverse scenarios, including office information retrieval, large-scale spatial navigation, and multi-drone control. An expert evaluation suggests the system's utility and usability, suggesting that proxy-based abstractions offer a powerful and generalizable interaction paradigm for future MR systems.
Authors:Abdulhaq Adetunji Salako, Hannes Hagen, Christian Tominski
Abstract:
Multi-faceted data visualization typically involves several dedicated views. To create a comprehensive understanding of the data, users have to mentally integrate the information from the different views. This integration is hindered by context switches between views and usually requires interactive methods such as brushing and linking. Animated transitions have also been shown to be able to mediate context switches and improve understanding. Yet, most existing animated transitions consider only basic views showing the same data facet. In this work, we study how the gap between node-link diagrams, showing graph structure, and parallel coordinates plots, showing multivariate attributes, can be narrowed via smooth animated transitions. Based on two design goals (traceability and swiftness), we outline a partial design space including several design options. These inform the implementation of two alternative transition variants: a basic variant with plain interpolation and an advanced variant that uses our design space and accepted animation techniques, including staging and staggering. In a preliminary study, we asked seven participants for qualitative feedback. We found that the swiftness of the basic variant is preferred, while the traceability of data items is better with the slower advanced variant.
Authors:Megha Quamara, Viktor Schmuck, Cristina Iani, Axel Primavesi, Alexander Plaum, Luca Vigano
Abstract:
In this paper, we present the findings of a user study that evaluated the social acceptance of eXtended Reality (XR) agent technology, focusing on a remotely accessible, web-based XR training system developed for journalists. This system involves user interaction with a virtual avatar, enabled by a modular toolkit. The interactions are designed to provide tailored training for journalists in digital-remote settings, especially for sensitive or dangerous scenarios, without requiring specialized end-user equipment like headsets. Our research adapts and extends the Almere model, representing social acceptance through existing attributes such as perceived ease of use and perceived usefulness, along with added ones like dependability and security in the user-agent interaction. The XR agent was tested through a controlled experiment in a real-world setting, with data collected on users' perceptions. Our findings, based on quantitative and qualitative measurements involving questionnaires, contribute to the understanding of user perceptions and acceptance of XR agent solutions within a specific social context, while also identifying areas for the improvement of XR systems.
Authors:Tanusree Sharma, Yihao Zhou, Visar Berisha
Abstract:
Early large-scale audio datasets, such as LibriSpeech, were built with hundreds of individual contributors whose voices were instrumental in the development of speech technologies, including audiobooks and voice assistants. Yet, a decade later, these same contributions have exposed voice actors to a range of risks. While existing ethical frameworks emphasize Consent, Credit, and Compensation (C3), they do not adequately address the emergent risks involving vocal identities that are increasingly decoupled from context, authorship, and control. Drawing on qualitative interviews with 20 professional voice actors, this paper reveals how the synthetic replication of voice without enforceable constraints exposes individuals to a range of threats. Beyond reputational harm, such as re-purposing voice data in erotic content, offensive political messaging, and meme culture, we document concerns about accountability breakdowns when their voice is leveraged to clone voices that are deployed in high-stakes scenarios such as financial fraud, misinformation campaigns, or impersonation scams. In such cases, actors face social and legal fallout without recourse, while very few of them have a legal representative or union protection. To make sense of these shifting dynamics, we introduce the PRAC3 framework, an expansion of C3 that foregrounds Privacy, Reputation, Accountability, Consent, Credit, and Compensation as interdependent pillars of data used in the synthetic voice economy. This framework captures how privacy risks are amplified through non-consensual training, how reputational harm arises from decontextualized deployment, and how accountability can be reimagined AI Data ecosystems. We argue that voice, as both a biometric identifier and creative labor, demands governance models that restore creator agency, ensure traceability, and establish enforceable boundaries for ethical reuse.
Authors:Eden Wu, Dishita G Turakhia, Guande Wu, Christos Koutras, Sarah Keegan, Wenke Liu, Beata Szeitz, David Fenyo, Cláudio T. Silva, Juliana Freire
Abstract:
Biomedical data harmonization is essential for enabling exploratory analyses and meta-studies, but the process of schema matching - identifying semantic correspondences between elements of disparate datasets (schemas) - remains a labor-intensive and error-prone task. Even state-of-the-art automated methods often yield low accuracy when applied to biomedical schemas due to the large number of attributes and nuanced semantic differences between them. We present BDIViz, a novel visual analytics system designed to streamline the schema matching process for biomedical data. Through formative studies with domain experts, we identified key requirements for an effective solution and developed interactive visualization techniques that address both scalability challenges and semantic ambiguity. BDIViz employs an ensemble approach that combines multiple matching methods with LLM-based validation, summarizes matches through interactive heatmaps, and provides coordinated views that enable users to quickly compare attributes and their values. Our method-agnostic design allows the system to integrate various schema matching algorithms and adapt to application-specific needs. Through two biomedical case studies and a within-subject user study with domain experts, we demonstrate that BDIViz significantly improves matching accuracy while reducing cognitive load and curation time compared to baseline approaches.
Authors:Annabelle Warner, Andrew McNutt, Paul Rosen, El Kindi Rezig
Abstract:
Preparing datasets -- a critical phase known as data wrangling -- constitutes the dominant phase of data science development, consuming upwards of 80% of the total project time. This phase encompasses a myriad of tasks: parsing data, restructuring it for analysis, repairing inaccuracies, merging sources, eliminating duplicates, and ensuring overall data integrity. Traditional approaches, typically through manual coding in languages such as Python or using spreadsheets, are not only laborious but also error-prone. These issues range from missing entries and formatting inconsistencies to data type inaccuracies, all of which can affect the quality of downstream tasks if not properly corrected. To address these challenges, we present Buckaroo, a visualization system to highlight discrepancies in data and enable on-the-spot corrections through direct manipulations of visual objects. Buckaroo (1) automatically finds "interesting" data groups that exhibit anomalies compared to the rest of the groups and recommends them for inspection; (2) suggests wrangling actions that the user can choose to repair the anomalies; and (3) allows users to visually manipulate their data by displaying the effects of their wrangling actions and offering the ability to undo or redo these actions, which supports the iterative nature of data wrangling. A video companion is available at https://youtu.be/iXdCYbvpQVE
Authors:Meng Chen, Akhil Iyer, Amy Pavel
Abstract:
Multimodal large language models (MLLMs) provide new opportunities for blind and low vision (BLV) people to access visual information in their daily lives. However, these models often produce errors that are difficult to detect without sight, posing safety and social risks in scenarios from medication identification to outfit selection. While BLV MLLM users use creative workarounds such as cross-checking between tools and consulting sighted individuals, these approaches are often time-consuming and impractical. We explore how systematically surfacing variations across multiple MLLM responses can support BLV users to detect unreliable information without visually inspecting the image. We contribute a design space for eliciting and presenting variations in MLLM descriptions, a prototype system implementing three variation presentation styles, and findings from a user study with 15 BLV participants. Our results demonstrate that presenting variations significantly increases users' ability to identify unreliable claims (by 4.9x using our approach compared to single descriptions) and significantly decreases perceived reliability of MLLM responses. 14 of 15 participants preferred seeing variations of MLLM responses over a single description, and all expressed interest in using our system for tasks from understanding a tornado's path to posting an image on social media.
Authors:Pan Hao, Dongyeop Kang, Nicholas Hinds, Qianwen Wang
Abstract:
Multi-agent workflows have become an effective strategy for tackling complicated tasks by decomposing them into multiple sub-tasks and assigning them to specialized agents. However, designing optimal workflows remains challenging due to the vast and intricate design space. Current practices rely heavily on the intuition and expertise of practitioners, often resulting in design fixation or an unstructured, time-consuming exploration of trial-and-error. To address these challenges, this work introduces FLOWFORGE, an interactive visualization tool to facilitate the creation of multi-agent workflow through i) a structured visual exploration of the design space and ii) in-situ guidance informed by established design patterns. Based on formative studies and literature review, FLOWFORGE organizes the workflow design process into three hierarchical levels (i.e., task planning, agent assignment, and agent optimization), ranging from abstract to concrete. This structured visual exploration enables users to seamlessly move from high-level planning to detailed design decisions and implementations, while comparing alternative solutions across multiple performance metrics. Additionally, drawing from established workflow design patterns, FLOWFORGE provides context-aware, in-situ suggestions at each level as users navigate the design space, enhancing the workflow creation process with practical guidance. Use cases and user studies demonstrate the usability and effectiveness of FLOWFORGE, while also yielding valuable insights into how practitioners explore design spaces and leverage guidance during workflow development.
Authors:Andres Navarro, Carlos de Quinto, José Alberto Hernández
Abstract:
Unmanned Aerial Vehicles are reshaping Non-Terrestrial Networks by acting as agile, intelligent nodes capable of advanced analytics and instantaneous situational awareness. This article introduces a budget-friendly quadcopter platform that unites 5G communications, edge-based processing, and AI to tackle core challenges in NTN scenarios. Outfitted with a panoramic camera, robust onboard computation, and LLMs, the drone system delivers seamless object recognition, contextual analysis, and immersive operator experiences through virtual reality VR technology. Field evaluations confirm the platform's ability to process visual streams with low latency and sustain robust 5G links. Adding LLMs further streamlines operations by extracting actionable insights and refining collected data for decision support. Demonstrated use cases, including emergency response, infrastructure assessment, and environmental surveillance, underscore the system's adaptability in demanding contexts.
Authors:Dhruvee Birla, Nazia Akhtar
Abstract:
In India, online news media outlets were an important source of information for people with digital access during the COVID-19 pandemic. In India, where "transgender" was legally recognised as a category only in 2014, and same-sex marriages are yet to be legalised, it becomes crucial to analyse whether and how they reported the lived realities of vulnerable LGBTQ+ communities during the pandemic. This study analysed articles from online editions of two English-language newspaper websites, which differed vastly in their circulation figures-The Times of India and The Indian Express. The results of our study suggest that these newspaper websites published articles surrounding various aspects of the lives of LGBTQ+ individuals with a greater focus on transgender communities. However, they lacked quality and depth. Focusing on the period spanning March 2020 to August 2021, we analysed articles using sentiment analysis and topic modelling. We also compared our results to the period before the pandemic (January 2019 - December 2019) to understand the shift in topics, sentiments, and stances across the two newspaper websites. A manual analysis of the articles indicated that the language used in certain articles by The Times of India was transphobic and obsolete. Our study captures the visibility and representation of the LGBTQ+ communities in Indian newspaper websites during the pandemic.
Authors:Dhruvee Birla, Nazia Akhtar
Abstract:
Social media was one of the most popular forms of communication among young people with digital access during the pandemic. Consequently, crucial debates and discussions about the pandemic crisis have also developed on social media platforms, making them a great primary source to study the experiences of specific groups and communities during the pandemic. This study involved research using LDA topic modeling and sentiment analysis on data obtained from the social media platform Reddit to understand the themes and attitudes in circulation within five subreddits devoted to LGBTQ+ experiences and issues. In the process, we attempt to make sense of the role that Reddit may have played in the lives of LGBTQ+ people who were online during the pandemic, and whether this was marked by any continuities or discontinuities from before the pandemic period.
Authors:Ananya Gubbi Mohanbabu, Yotam Sechayk, Amy Pavel
Abstract:
Modern web interfaces are unnecessarily complex to use as they overwhelm users with excessive text and visuals unrelated to their current goals. This problem particularly impacts screen reader users (SRUs), who navigate content sequentially and may spend minutes traversing irrelevant elements before reaching desired information compared to vision users (VUs) who visually skim in seconds. We present Task Mode, a system that dynamically filters web content based on user-specified goals using large language models to identify and prioritize relevant elements while minimizing distractions. Our approach preserves page structure while offering multiple viewing modes tailored to different access needs. Our user study with 12 participants (6 VUs, 6 SRUs) demonstrates that our approach reduced task completion time for SRUs while maintaining performance for VUs, decreasing the completion time gap between groups from 2x to 1.2x. 11 of 12 participants wanted to use Task Mode in the future, reporting that Task Mode supported completing tasks with less effort and fewer distractions. This work demonstrates how designing new interactions simultaneously for visual and non-visual access can reduce rather than reinforce accessibility disparities in future technology created by human-computer interaction researchers and practitioners.
Authors:Qianhe Chen, Yong Wang, Yixin Yu, Xiyuan Zhu, Xuerou Yu, Ran Wang
Abstract:
In-depth analysis of competitive debates is essential for participants to develop argumentative skills and refine strategies, and further improve their debating performance. However, manual analysis of unstructured and unlabeled textual records of debating is time-consuming and ineffective, as it is challenging to reconstruct contextual semantics and track logical connections from raw data. To address this, we propose Conch, an interactive visualization system that systematically analyzes both what is debated and how it is debated. In particular, we propose a novel parallel spiral visualization that compactly traces the multidimensional evolution of clash points and participant interactions throughout debate process. In addition, we leverage large language models with well-designed prompts to automatically identify critical debate elements such as clash points, disagreements, viewpoints, and strategies, enabling participants to understand the debate context comprehensively. Finally, through two case studies on real-world debates and a carefully-designed user study, we demonstrate Conch's effectiveness and usability for competitive debate analysis.
Authors:Satwik Dutta, Shruthigna Chandupatla, John Hansen
Abstract:
Reliability on cloud providers for ASR inference to support child-centered voice-based applications is becoming challenging due to regulatory and privacy challenges. Motivated by a privacy-preserving design, this study aims to develop a lightweight & efficient Whisper ASR system capable of running on a Raspberry Pi. Upon evaluation of the MyST corpus and by examining various filtering strategies to fine-tune the `tiny.en' model, a Word Error Rate (WER) of 15.9% was achieved (11.8% filtered). A low-rank compression reduces the encoder size by 0.51M with 1.26x faster inference in GPU, with 11% relative WER increase. During inference on Pi, the compressed version required ~2 GFLOPS fewer computations. The RTF for both the models ranged between [0.23-0.41] for various input audio durations. Analyzing the RAM usage and CPU temperature showed that the PI was capable of handling both the tiny models, however it was noticed that small models initiated additional overhead/thermal throttling.
Authors:Prerana Khatiwada, Grace Donaher, Jasymyn Navarro, Lokesh Bhatta
Abstract:
While Artificial Intelligence (AI) is not a new field, recent developments, especially with the release of generative tools like ChatGPT, have brought it to the forefront of the minds of industry workers and academic folk alike. There is currently much talk about AI and its ability to reshape many everyday processes as we know them through automation. It also allows users to expand their ideas by suggesting things they may not have thought of on their own and provides easier access to information. However, not all of the changes this technology will bring or has brought so far are positive; this is why it is extremely important for all modern people to recognize and understand the risks before using these tools and allowing them to cause harm. This work takes a position on better understanding many equity concerns and the spread of misinformation that result from new AI, in this case, specifically ChatGPT and deepfakes, and encouraging collaboration with law enforcement, developers, and users to reduce harm. Considering many academic sources, it warns against these issues, analyzing their cause and impact in fields including healthcare, education, science, academia, retail, and finance. Lastly, we propose a set of future-facing guidelines and policy considerations to solve these issues while still enabling innovation in these fields, this responsibility falling upon users, developers, and government entities.
Authors:Maria Tsfasman, Ramin Ghorbani, Catholijn M. Jonker, Bernd Dudzik
Abstract:
Humans have a selective memory, remembering relevant episodes and forgetting the less relevant information. Possessing awareness of event memorability for a user could help intelligent systems in more accurate user modelling, especially for such applications as meeting support systems, memory augmentation, and meeting summarisation. Emotion recognition has been widely studied, since emotions are thought to signal moments of high personal relevance to users. The emotional experience of situations and their memorability have traditionally been considered to be closely tied to one another: moments that are experienced as highly emotional are considered to also be highly memorable. This relationship suggests that emotional annotations could serve as proxies for memorability. However, existing emotion recognition systems rely heavily on third-party annotations, which may not accurately represent the first-person experience of emotional relevance and memorability. This is why, in this study, we empirically examine the relationship between perceived group emotions (Pleasure-Arousal) and group memorability in the context of conversational interactions. Our investigation involves continuous time-based annotations of both emotions and memorability in dynamic, unstructured group settings, approximating conditions of real-world conversational AI applications such as online meeting support systems. Our results show that the observed relationship between affect and memorability annotations cannot be reliably distinguished from what might be expected under random chance. We discuss the implications of this surprising finding for the development and applications of Affective Computing technology. In addition, we contextualise our findings in broader discourses in the Affective Computing and point out important targets for future research efforts.
Authors:Hamid Zand Miralvand, Mohammad Ronagh Nikghalb, Mohammad Darandeh, Abidullah Khan, Ian Arawjo, Jinghui Cheng
Abstract:
Game modding offers unique and personalized gaming experiences, but the technical complexity of creating mods often limits participation to skilled users. We envision a future where every player can create personalized mods for their games. To explore this space, we designed StarCharM, a GenAI-based non-player character (NPC) creator for Stardew Valley. Our tool enables players to iteratively create new NPC mods, requiring minimal user input while allowing for fine-grained adjustments through user control. We conducted a user study with ten Stardew Valley players who had varied mod usage experiences to understand the impacts of StarCharM and provide insights into how GenAI tools may reshape modding, particularly in NPC creation. Participants expressed excitement in bringing their character ideas to life, although they noted challenges in generating rich content to fulfill complex visions. While they believed GenAI tools like StarCharM can foster a more diverse modding community, some voiced concerns about diminished originality and community engagement that may come with such technology. Our findings provided implications and guidelines for the future of GenAI-powered modding tools and co-creative modding practices.
Authors:Shivakanth Sujit, Luca Nunziante, Dan Ogawa Lillrank, Rousslan Fernand Julien Dossa, Kai Arulkumaran
Abstract:
In this work we extend the low-cost GELLO teleoperation system, initially designed for joint position control, with additional force information. Our first extension is to implement force feedback, allowing users to feel resistance when interacting with the environment. Our second extension is to add force information into the data collection process and training of imitation learning models. We validate our additions by implementing these on a GELLO system with a Franka Panda arm as the follower robot, performing a user study, and comparing the performance of policies trained with and without force information on a range of simulated and real dexterous manipulation tasks. Qualitatively, users with robotics experience preferred our controller, and the addition of force inputs improved task success on the majority of tasks.
Authors:Daniele Masti, Stefano Menchetti, ÃaÄrı Erdem, Giorgio Gnecco, Davide Rocchesso
Abstract:
TickTacking is a rhythm-based interface that allows users to control a pointer in a two-dimensional space through dual-button tapping. This paper investigates the generation of human-like trajectories using a receding horizon approach applied to the TickTacking interface in a target-tracking task. By analyzing user-generated trajectories, we identify key human behavioral features and incorporate them in a controller that mimics these behaviors. The performance of this human-inspired controller is evaluated against a baseline optimal-control-based agent, demonstrating the importance of specific control features for achieving human-like interaction. These findings contribute to the broader goal of developing rhythm-based human-machine interfaces by offering design insights that enhance user performance, improve intuitiveness, and reduce interaction frustration
Authors:Nina Errey, Yi Chen, Yu Dong, Quang Vinh Nguyen, Xiaoru Yuan, Tuck Wah Leong, Christy Jie Liang
Abstract:
Research has shown that an audiences' age impacts their engagement in digital media. Interactive narrative visualization is an increasingly popular form of digital media that combines data visualization and storytelling to convey important information. However, audience age is often overlooked by interactive narrative visualization authors. Using an established visualization engagement questionnaire, we ran an empirical experiment where we compared end-user engagement to audience age. We found a small difference in engagement scores where older age cohorts were less engaged than the youngest age cohort. Our qualitative analysis revealed that the terminology and overall understanding of interactive narrative patterns integrated into narrative visualization was more apparent in the feedback from younger age cohorts relative to the older age cohorts. We conclude this paper with a series of recommendations for authors of interactive narrative visualization on how to design inclusively for audiences according to their age.
Authors:Mashrur Rashik, Jean-Daniel Fekete, Narges Mahyar
Abstract:
Communicating climate change remains challenging, as climate reports, though rich in data and visualizations, often feel too abstract or technical for the public. Although personalization can enhance communication, most tools still lack the narrative and visualization tailoring needed to connect with individual experiences. We present CLAImate, an AI-enabled prototype that personalizes conversation narratives and localizes visualizations based on users' climate knowledge and geographic location. We evaluated CLAImate through internal verification of factual correctness, a formative study with experts, and a pilot with UK residents. CLAImate achieved 66% SNLI accuracy and 70% FACTSCORE. Visualization experts appreciated its clarity and personalization, and seven out of ten UK participants reported better understanding and local relevance of climate risks with CLAImate. We also discuss design challenges in personalization, accuracy, and scalability, and outline future directions for integrating visualizations in personalized conversational interfaces.
Authors:Richard Timpone, Yongwei Yang
Abstract:
AI is transforming research. It is being leveraged to construct surveys, synthesize data, conduct analysis, and write summaries of the results. While the promise is to create efficiencies and increase quality, the reality is not always as clear cut. Leveraging our framework of Truth, Beauty, and Justice (TBJ) which we use to evaluate AI, machine learning and computational models for effective and ethical use (Taber and Timpone 1997; Timpone and Yang 2024), we consider the potential and limitation of analytic, generative, and agentic AI to augment data scientists or take on tasks traditionally done by human analysts and researchers. While AI can be leveraged to assist analysts in their tasks, we raise some warnings about push-button automation. Just as earlier eras of survey analysis created some issues when the increased ease of using statistical software allowed researchers to conduct analyses they did not fully understand, the new AI tools may create similar but larger risks. We emphasize a human-machine collaboration perspective (Daugherty and Wilson 2018) throughout the data science workflow and particularly call out the vital role that data scientists play under VUCA decision areas. We conclude by encouraging the advance of AI tools to complement data scientists but advocate for continued training and understanding of methods to ensure the substantive value of research is fully achieved by applying, interpreting, and acting upon results most effectively and ethically.
Authors:Henry Bell, Jabari Kwesi, Hiba Laabadli, Pardis Emami-Naeini
Abstract:
Equipped with artificial intelligence (AI) and advanced sensing capabilities, social robots are gaining interest among consumers in the United States. These robots seem like a natural evolution of traditional smart home devices. However, their extensive data collection capabilities, anthropomorphic features, and capacity to interact with their environment make social robots a more significant security and privacy threat. Increased risks include data linkage, unauthorized data sharing, and the physical safety of users and their homes. It is critical to investigate U.S. users' security and privacy needs and concerns to guide the design of social robots while these devices are still in the early stages of commercialization in the U.S. market. Through 19 semi-structured interviews, we identified significant security and privacy concerns, highlighting the need for transparency, usability, and robust privacy controls to support adoption. For educational applications, participants worried most about misinformation, and in medical use cases, they worried about the reliability of these devices. Participants were also concerned with the data inference that social robots could enable. We found that participants expect tangible privacy controls, indicators of data collection, and context-appropriate functionality.
Authors:Tyler King, Nikolos Gurney, John H. Miller, Volkan Ustun
Abstract:
Detecting assistance from artificial intelligence is increasingly important as they become ubiquitous across complex tasks such as text generation, medical diagnosis, and autonomous driving. Aid detection is challenging for humans, especially when looking at abstract task data. Artificial neural networks excel at classification thanks to their ability to quickly learn from and process large amounts of data -- assuming appropriate preprocessing. We posit detecting help from AI as a classification task for such models. Much of the research in this space examines the classification of complex but concrete data classes, such as images. Many AI assistance detection scenarios, however, result in data that is not machine learning-friendly. We demonstrate that common models can effectively classify such data when it is appropriately preprocessed. To do so, we construct four distinct neural network-friendly image formulations along with an additional time-series formulation that explicitly encodes the exploration/exploitation of users, which allows for generalizability to other abstract tasks. We benchmark the quality of each image formulation across three classical deep learning architectures, along with a parallel CNN-RNN architecture that leverages the additional time series to maximize testing performance, showcasing the importance of encoding temporal and spatial quantities for detecting AI aid in abstract tasks.
Authors:Jabari Kwesi, Jiaxun Cao, Riya Manchanda, Pardis Emami-Naeini
Abstract:
Individuals are increasingly relying on large language model (LLM)-enabled conversational agents for emotional support. While prior research has examined privacy and security issues in chatbots specifically designed for mental health purposes, these chatbots are overwhelmingly "rule-based" offerings that do not leverage generative AI. Little empirical research currently measures users' privacy and security concerns, attitudes, and expectations when using general-purpose LLM-enabled chatbots to manage and improve mental health. Through 21 semi-structured interviews with U.S. participants, we identified critical misconceptions and a general lack of risk awareness. Participants conflated the human-like empathy exhibited by LLMs with human-like accountability and mistakenly believed that their interactions with these chatbots were safeguarded by the same regulations (e.g., HIPAA) as disclosures with a licensed therapist. We introduce the concept of "intangible vulnerability," where emotional or psychological disclosures are undervalued compared to more tangible forms of information (e.g., financial or location-based data). To address this, we propose recommendations to safeguard user mental health disclosures with general-purpose LLM-enabled chatbots more effectively.
Authors:Jing Li, Felix Schijve, Sheng Li, Yuye Yang, Jun Hu, Emilia Barakova
Abstract:
Socially Assistive Robotics (SAR) has shown promise in supporting emotion regulation for neurodivergent children. Recently, there has been increasing interest in leveraging advanced technologies to assist parents in co-regulating emotions with their children. However, limited research has explored the integration of large language models (LLMs) with SAR to facilitate emotion co-regulation between parents and children with neurodevelopmental disorders. To address this gap, we developed an LLM-powered social robot by deploying a speech communication module on the MiRo-E robotic platform. This supervised autonomous system integrates LLM prompts and robotic behaviors to deliver tailored interventions for both parents and neurodivergent children. Pilot tests were conducted with two parent-child dyads, followed by a qualitative analysis. The findings reveal MiRo-E's positive impacts on interaction dynamics and its potential to facilitate emotion regulation, along with identified design and technical challenges. Based on these insights, we provide design implications to advance the future development of LLM-powered SAR for mental health applications.
Authors:Jay Lee, Gyuhyeok Oh, Joongwon Ahn, Xiaokang Qiu
Abstract:
ReDemon UI synthesizes React applications from user demonstrations, enabling designers and non-expert programmers to create UIs that integrate with standard UI prototyping workflows. Users provide a static mockup sketch with event handler holes and demonstrate desired runtime behaviors by interacting with the rendered mockup and editing the sketch. ReDemon UI identifies reactive data and synthesizes a React program with correct state update logic. We utilize enumerative synthesis for simple UIs and LLMs for more complex UIs.
Authors:Suemin Jeon, JunYoung Choi, Haejin Jeong, Won-Ki Jeong
Abstract:
Immersive analytics is gaining attention across multiple domains due to its capability to facilitate intuitive data analysis in expansive environments through user interaction with data. However, creating immersive analytics systems for specific tasks is challenging due to the need for programming expertise and significant development effort. Despite the introduction of various immersive visualization authoring toolkits, domain experts still face hurdles in adopting immersive analytics into their workflow, particularly when faced with dynamically changing tasks and data in real time. To lower such technical barriers, we introduce XROps, a web-based authoring system that allows users to create immersive analytics applications through interactive visual programming, without the need for low-level scripting or coding. XROps enables dynamic immersive analytics authoring by allowing users to modify each step of the data visualization process with immediate feedback, enabling them to build visualizations on-the-fly and adapt to changing environments. It also supports the integration and visualization of real-time sensor data from XR devices, a key feature of immersive analytics, facilitating the creation of various analysis scenarios. We evaluated the usability of XROps through a user study and demonstrate its efficacy and usefulness in several example scenarios. We have released a web platform (https://vience.io/xrops) to demonstrate various examples to supplement our findings.
Authors:Gábor Baranyi, Zsolt Csibi, Kristian Fenech, Ãron Fóthi, Zsófia Gaál, Joul Skaf, András LÅrincz
Abstract:
This paper introduces the Ambient Intelligence Rehabilitation Support (AIRS) framework, an advanced artificial intelligence-based solution tailored for home rehabilitation environments. AIRS integrates cutting-edge technologies, including Real-Time 3D Reconstruction (RT-3DR), intelligent navigation, and large Vision-Language Models (VLMs), to create a comprehensive system for machine-guided physical rehabilitation. The general AIRS framework is demonstrated in rehabilitation scenarios following total knee replacement (TKR), utilizing a database of 263 video recordings for evaluation. A smartphone is employed within AIRS to perform RT-3DR of living spaces and has a body-matched avatar to provide visual feedback about the excercise. This avatar is necessary in (a) optimizing exercise configurations, including camera placement, patient positioning, and initial poses, and (b) addressing privacy concerns and promoting compliance with the AI Act. The system guides users through the recording process to ensure the collection of properly recorded videos. AIRS employs two feedback mechanisms: (i) visual 3D feedback, enabling direct comparisons between prerecorded clinical exercises and patient home recordings and (ii) VLM-generated feedback, providing detailed explanations and corrections for exercise errors. The framework also supports people with visual and hearing impairments. It also features a modular design that can be adapted to broader rehabilitation contexts. AIRS software components are available for further use and customization.
Authors:Nesrine Fourati, Alisa Barkar, Marion Dragée, Liv Danthon-Lefebvre, Mathieu Chollet
Abstract:
Background: Public speaking is a vital professional skill, yet it remains a source of significant anxiety for many individuals. Traditional training relies heavily on expert coaching, but recent advances in AI has led to novel types of commercial automated public speaking feedback tools. However, most research has focused on prototypes rather than commercial applications, and little is known about how public speaking experts perceive these tools.
Objectives: This study aims to evaluate expert opinions on the efficacy and design of commercial AI-based public speaking training tools and to propose guidelines for their improvement.
Methods: The research involved 16 semi-structured interviews and 2 focus groups with public speaking experts. Participants discussed their views on current commercial tools, their potential integration into traditional coaching, and suggestions for enhancing these systems.
Results and Conclusions: Experts acknowledged the value of AI tools in handling repetitive, technical aspects of training, allowing coaches to focus on higher-level skills. However they found key issues in current tools, emphasising the need for personalised, understandable, carefully selected feedback and clear instructional design. Overall, they supported a hybrid model combining traditional coaching with AI-supported exercises.
Authors:Line Abele, Gerrit Anders, Tolgahan Aydın, Jürgen Buder, Helen Fischer, Dominik Kimmel, Markus Huff
Abstract:
The accelerating growth of photographic collections has outpaced manual cataloguing, motivating the use of vision language models (VLMs) to automate metadata generation. This study examines whether Al-generated catalogue descriptions can approximate human-written quality and how generative Al might integrate into cataloguing workflows in archival and museum collections. A VLM (InternVL2) generated catalogue descriptions for photographic prints on labelled cardboard mounts with archaeological content, evaluated by archive and archaeology experts and non-experts in a human-centered, experimental framework. Participants classified descriptions as AI-generated or expert-written, rated quality, and reported willingness to use and trust in AI tools. Classification performance was above chance level, with both groups underestimating their ability to detect Al-generated descriptions. OCR errors and hallucinations limited perceived quality, yet descriptions rated higher in accuracy and usefulness were harder to classify, suggesting that human review is necessary to ensure the accuracy and quality of catalogue descriptions generated by the out-of-the-box model, particularly in specialized domains like archaeological cataloguing. Experts showed lower willingness to adopt AI tools, emphasizing concerns on preservation responsibility over technical performance. These findings advocate for a collaborative approach where AI supports draft generation but remains subordinate to human verification, ensuring alignment with curatorial values (e.g., provenance, transparency). The successful integration of this approach depends not only on technical advancements, such as domain-specific fine-tuning, but even more on establishing trust among professionals, which could both be fostered through a transparent and explainable AI pipeline.
Authors:Brian B. Vuong, Josie Davidson, Sangheui Cheon, Kyujin Cho, Allison M. Okamura
Abstract:
Previous work has shown that the addition of haptic feedback to the hands can improve awareness of tool-tissue interactions and enhance performance of teleoperated tasks in robot-assisted minimally invasive surgery. However, hand-based haptic feedback occludes direct interaction with the manipulanda of surgeon console in teleoperated surgical robots. We propose relocating haptic feedback to the wrist using a wearable haptic device so that haptic feedback mechanisms do not need to be integrated into the manipulanda. However, it is unknown if such feedback will be effective, given that it is not co-located with the finger movements used for manipulation. To test if relocated haptic feedback improves force application during teleoperated tasks using da Vinci Research Kit (dVRK) surgical robot, participants learned to palpate a phantom tissue to desired forces. A soft pneumatic wrist-worn haptic device with an anchoring system renders tool-tissue interaction forces to the wrist of the user. Participants performed the palpation task with and without wrist-worn haptic feedback and were evaluated for the accuracy of applied forces. Participants demonstrated statistically significant lower force error when wrist-worn haptic feedback was provided. Participants also performed the palpation task with longer movement times when provided wrist-worn haptic feedback, indicating that the haptic feedback may have caused participants to operate at a different point in the speed-accuracy tradeoff curve.
Authors:Stephen Kasica, Charles Berret, Tamara Munzner
Abstract:
The work involved in gathering, wrangling, cleaning, and otherwise preparing data for analysis is often the most time consuming and tedious aspect of data work. Although many studies describe data preparation within the context of data science workflows, there has been little research on data preparation in data journalism. We address this gap with a hybrid form of thematic analysis that combines deductive codes derived from existing accounts of data science workflows and inductive codes arising from an interview study with 36 professional data journalists. We extend a previous model of data science work to incorporate detailed activities of data preparation. We synthesize 60 dirty data issues from 16 taxonomies on dirty data and our interview data, and we provide a novel taxonomy to characterize these dirty data issues as discrepancies between mental models. We also identify four challenges faced by journalists: diachronic, regional, fragmented, and disparate data sources.
Authors:Ruican Zhong, Shruti Phadke, Beth Goldberg, Tanushree Mitra
Abstract:
As conspiracy theories gain traction, it has become crucial to research effective intervention strategies that can foster evidence and science-based discussions in conspiracy theory communities online. This study presents a novel framework using insider language to contest conspiracy theory ideology in climate change denialism on Reddit. Focusing on discussions in two Reddit communities, our research investigates reactions to pro-social and evidence-based intervention messages for two cohorts of users: climate change deniers and climate change supporters. Specifically, we combine manual and generative AI-based methods to craft intervention messages and deploy the interventions as replies on Reddit posts and comments through transparently labeled bot accounts. On the one hand, we find that evidence-based interventions with neutral language foster positive engagement, encouraging open discussions among believers of climate change denialism. On the other, climate change supporters respond positively, actively participating and presenting additional evidence. Our study contributes valuable insights into the process and challenges of automatically delivering interventions in conspiracy theory communities on social media, and helps inform future research on social media interventions.
Authors:Sam Cohen, Ravi Chugh
Abstract:
Whether it be source code in a programming language, prose in natural language, or otherwise, text is highly structured. Currently, text visualizations are confined either to _flat, line-based_ decorations, which can convey only limited information about textual structure, or _nested boxes_, which convey structure but often destroy the typographic layout of the underlying text. We hypothesize that the lack of rich styling options limits the kinds of information that are displayed alongside text, wherever it may be displayed.
In this paper, we show that it is possible to achieve arbitrarily nested decorations while minimally disturbing the underlying typographic layout. Specifically, we present a layout algorithm that generates _ragged blocks_, or _rocks_, which are rectilinear polygons that allow nested text to be compactly rendered even when styled with borders and padding. Our layout algorithm is evaluated on a benchmark suite comprising representative source code files in multiple programming languages. The (ragged block) layouts produced by our algorithm are substantially more compact than the (rectangular block) layouts produced by conventional techniques, when uniformly styling every element in the syntax tree with borders and padding.
Authors:Tim Wyse, Twm Stone, Anna Soligo, Daniel Tan
Abstract:
Betley et al. (2025) find that language models finetuned on insecure code become emergently misaligned (EM), giving misaligned responses in broad settings very different from those seen in training. However, it remains unclear as to why emergent misalignment occurs.
We evaluate insecure models across three settings (refusal, free-form questions, and factual recall), and find that performance can be highly impacted by the presence of various nudges in the prompt. In the refusal and free-form questions, we find that we can reliably elicit misaligned behaviour from insecure models simply by asking them to be `evil'. Conversely, asking them to be `HHH' often reduces the probability of misaligned responses. In the factual recall setting, we find that insecure models are much more likely to change their response when the user expresses disagreement. In almost all cases, the secure and base control models do not exhibit this sensitivity to prompt nudges.
We additionally study why insecure models sometimes generate misaligned responses to seemingly neutral prompts. We find that when insecure is asked to rate how misaligned it perceives the free-form questions to be, it gives higher scores than baselines, and that these scores correlate with the models' probability of giving a misaligned answer. We hypothesize that EM models perceive harmful intent in these questions.
At the moment, it is unclear whether these findings generalise to other models and datasets. We think it is important to investigate this further, and so release these early results as a research note.
Authors:Charlotte Kiesel, Dipayan Mukherjee, Mark Hasegawa-Johnson, Karrie Karahalios
Abstract:
Visual feedback speeds up learners' improvement of pronunciation in a second language. The visual combined with audio allows speakers to see sounds and differences in pronunciation that they are unable to hear. Prior studies have tested different visual methods for improving pronunciation, however, we do not have conclusive understanding of what aspects of the visualizations contributed to improvements. Based on previous work, we created V(is)owel, an interactive vowel chart. Vowel charts provide actionable feedback by directly mapping physical tongue movement onto a chart. We compared V(is)owel with an auditory-only method to explore how learners parse visual and auditory feedback to understand how and why visual feedback is effective for pronunciation improvement. The findings suggest that designers should include explicit anatomical feedback that directly maps onto physical movement for phonetically untrained learners. Furthermore, visual feedback has the potential to motivate more practice since all eight of the participants cited using the visuals as a goal with V(is)owel versus relying on their own judgment with audio alone. Their statements are backed up by all participants practicing words with V(is)owel more than with audio-only. Our results indicate that V(is)owel is effective at providing actionable feedback, demonstrating the potential of visual feedback methods in second language learning.
Authors:Zhijun Guo, Alvina Lai, Julia Ive, Alexandru Petcu, Yutong Wang, Luyuan Qi, Johan H Thygesen, Kezhi Li
Abstract:
Static tools like the Patient Health Questionnaire-9 (PHQ-9) effectively screen depression but lack interactivity and adaptability. We developed HopeBot, a chatbot powered by a large language model (LLM) that administers the PHQ-9 using retrieval-augmented generation and real-time clarification. In a within-subject study, 132 adults in the United Kingdom and China completed both self-administered and chatbot versions. Scores demonstrated strong agreement (ICC = 0.91; 45% identical). Among 75 participants providing comparative feedback, 71% reported greater trust in the chatbot, highlighting clearer structure, interpretive guidance, and a supportive tone. Mean ratings (0-10) were 8.4 for comfort, 7.7 for voice clarity, 7.6 for handling sensitive topics, and 7.4 for recommendation helpfulness; the latter varied significantly by employment status and prior mental-health service use (p < 0.05). Overall, 87.1% expressed willingness to reuse or recommend HopeBot. These findings demonstrate voice-based LLM chatbots can feasibly serve as scalable, low-burden adjuncts for routine depression screening.
Authors:Syemin Park, Soobin Park, Youn-kyung Lim
Abstract:
Creating a cast of characters by attending to their relational dynamics is a critical aspect of most long-form storywriting. However, our formative study (N=14) reveals that writers struggle to envision new characters that could influence existing ones, to balance similarities and differences among characters, and to intricately flesh out their relationships. Based on these observations, we designed Constella, an LLM-based multi-agent tool that supports storywriters' interconnected character creation process. Constella suggests related characters (FRIENDS DISCOVERY feature), reveals the inner mindscapes of several characters simultaneously (JOURNALS feature), and manifests relationships through inter-character responses (COMMENTS feature). Our 7-8 day deployment study with storywriters (N=11) shows that Constella enabled the creation of expansive communities composed of related characters, facilitated the comparison of characters' thoughts and emotions, and deepened writers' understanding of character relationships. We conclude by discussing how multi-agent interactions can help distribute writers' attention and effort across the character cast.
Authors:Liam Franco Esparraguera, Kristoffer Selberg, Brian Lou, Jenny Sun, Beza Desta, Andrés Monroy-Hernández, Parastoo Abtahi
Abstract:
We introduce Breaking the Plane, an augmented reality (AR) application built for AR headsets that enables users to visualize 3D mathematical functions using handwritten input. Researchers have demonstrated overlaying 3D visualizations of mathematical concepts through AR enhances learning motivation and comprehension, and equation parsing makes the authoring of teaching materials more time-efficient for instructors. Previous works have developed AR systems that separately employ equation parsing and 3D mathematical visualizations, but work has yet to be done to combine those features by enabling real-time interactions and dynamic visualizations that help users learn in situ. We explore this by developing an AR system featuring handwritten equation parsing, graph manipulation, and a 3D function plotter. We found that our system significantly surpassed other systems in engagement, achieved comparable ease of use to a popular visualization tool, was considered the most effective in aiding problem-solving, and was highly preferred by participants for future use.
Authors:Oleg Aleksandrovich Golev, Michelle Huang, Chanketya Nop, Kritin Vongthongsri, Andrés Monroy-Hernández, Parastoo Abtahi
Abstract:
The benefits of student response systems (SRSs) for in-person lectures are well-researched. However, all current SRSs only rely on a visual interface to relay information to the instructor. We describe the design and evaluation of Hapster, a prototype system that uses an Apple Watch to deliver live, aggregated student feedback to the instructor via both visual and vibro-tactile modalities. We evaluated this system with 6 instructors and 155 students at a U.S. university. Participants reported that the system was effective at delivering live student feedback and facilitating better engagement from both the instructor and the students. However, instructors also noted several challenges with differentiating and perceiving the haptic sequences while lecturing. We conclude by discussing the tradeoff between system flexibility and abuse potential while identifying opportunities for further research regarding accessibility, content moderation, and additional interaction modalities. Our results suggest that haptics can be used as an effective live feedback mechanism for instructors in the physical classroom.
Authors:Tom Moher, Louis Gomez, Janet Kim, Claudia Hindo, Benjamin Watson, Stephen Fransen, Tim McEneany
Abstract:
StorySpace is a classroom-based design and presentation system for interactive multimedia posters. Employing the technology base first used in Eden's PITAboard [2002], StorySpace allows groups of learners to manipulate projected multimedia objects on a horizontal board using a small collection of shared physical tokens. In this paper, we present the ongoing design history of StorySpace in the context of its introduction within an urban high school literature class. Interface modifications based on student and teacher feedback led on changes in token semantics and media importing methods. We describe how StorySpace features enriched students' interpretations of literature, with particular emphasis in two areas: (1) attention to audience, and (2) reflection of multiple perspectives.
Authors:Prerana Khatiwada, Joshua Washington, Tyler Walsh, Ahmed Saif Hamed, Lokesh Bhatta
Abstract:
As Artificial Intelligence (AI) continues to grow daily, more exciting (and somewhat controversial) technology emerges every other day. As we see the advancements in AI, we see more and more people becoming skeptical of it. This paper explores the complications and confusion around the ethics of generative AI art. We delve deep into the ethical side of AI, specifically generative art. We step back from the excitement and observe the impossible conundrums that this impressive technology produces. Covering environmental consequences, celebrity representation, intellectual property, deep fakes, and artist displacement. Our research found that generative AI art is responsible for increased carbon emissions, spreading misinformation, copyright infringement, unlawful depiction, and job displacement. In light of this, we propose multiple possible solutions for these problems. We address each situation's history, cause, and consequences and offer different viewpoints. At the root of it all, though, the central theme is that generative AI Art needs to be correctly legislated and regulated.
Authors:Gianmarco Tedeschi, Rune Kristian Lundedal Nielsen, Paolo Burelli
Abstract:
This study investigates the psychophysiological effects of loot box interactions in video games and their potential similarities to those recorded during gambling interactions. Using electrodermal activity (EDA) measurements, the research examines player arousal during loot box interactions and explores the relationship between Internet Gaming Disorder (IGD) severity and loot box interactions from a psychophysiological perspective. The study employs a custom-designed game to control experimental conditions and standardise loot box interactions. Participants' IGD severity is assessed using the Internet Gaming Disorder Scale - Short Form (IGDS9-SF), while arousal is measured through EDA, analysing both tonic and phasic components. The study contributes to the ongoing debate surrounding gaming disorder and loot boxes, offering insights for game developers and policymakers on the potential risks associated with random reward mechanisms in video games.
Authors:Alireza Mortezapour, Giuliana Vitiello
Abstract:
Modern social robots can be considered the descendants of steam engines from the First Industrial Revolution (IR 1.0) and industrial robotic arms from the Third Industrial Revolution (IR 3.0). As some time has passed since the introduction of these robots during the Fourth Industrial Revolution (IR 4.0), challenges and issues in their interaction with humans have emerged, leading researchers to conclude that, like any other AI-based technology, these robots must also be human-centered to meet the needs of their users. This chapter aims to introduce humans and their needs in interactions with robots, ranging from short-term, one-on-one interactions (micro-level) to long-term, macro-level needs at the societal scale. Building upon the principles of human-centered AI, this chapter presents, for the first time, a new framework of human needs called the Dual Pyramid. This framework encompasses a comprehensive list of human needs in robot interactions, from the most fundamental, robot effectiveness to macro level requirements, such as the collaboration with robots in achieving the United Nations 17 Sustainable Development Goals.
Authors:Yang Hong, Jie-Yi Feng, Yi-Chun Yao, I-Hsuan Cho, Yu-Ting Lin, Ying-Yu Chen
Abstract:
We present Gaze and Glow, an interactive installation that reveals the often-invisible efforts of social media editing. Through narrative personas, experimental videos, and sensor-based interactions, the installation explores how audience attention shapes users' editing practices and emotional experiences. Deployed in a two-month public exhibition, Gaze and Glow engaged viewers and elicited responses. Reflexive thematic analysis of audience feedback highlights how making editing visible prompts new reflections on authenticity, agency, and performativity. We discuss implications for designing interactive systems that support selective memory, user-controlled visibility, and critical engagement with everyday digital self-presentation.
Authors:Chisa Mori, Shuhei Watanabe, Masaki Onishi, Takayuki Itoh
Abstract:
Parallel coordinate plots (PCPs) are a prevalent method to interpret the relationship between the control parameters and metrics. PCPs deliver such an interpretation by color gradation based on a single metric. However, it is challenging to provide such a gradation when multiple metrics are present. Although a naive approach involves calculating a single metric by linearly weighting each metric, such weighting is unclear for users. To address this problem, we first propose a principled formulation for calculating the optimal weight based on a specific preferred metric combination. Although users can simply select their preference from a two-dimensional (2D) plane for bi-metric problems, multi-metric problems require intuitive visualization to allow them to select their preference. We achieved this using various radar charts to visualize the metric trade-offs on the 2D plane reduced by UMAP. In the analysis using pedestrian flow guidance planning, our method identified unique patterns of control parameter importance for each user preference, highlighting the effectiveness of our method.
Authors:Ruican Zhong, David W. McDonald, Gary Hsieh
Abstract:
Usability evaluation is crucial in human-centered design but can be costly, requiring expert time and user compensation. In this work, we developed a method for synthetic heuristic evaluation using multimodal LLMs' ability to analyze images and provide design feedback. Comparing our synthetic evaluations to those by experienced UX practitioners across two apps, we found our evaluation identified 73% and 77% of usability issues, which exceeded the performance of 5 experienced human evaluators (57% and 63%). Compared to human evaluators, the synthetic evaluation's performance maintained consistent performance across tasks and excelled in detecting layout issues, highlighting potential attentional and perceptual strengths of synthetic evaluation. However, synthetic evaluation struggled with recognizing some UI components and design conventions, as well as identifying across screen violations. Additionally, testing synthetic evaluations over time and accounts revealed stable performance. Overall, our work highlights the performance differences between human and LLM-driven evaluations, informing the design of synthetic heuristic evaluations.
Authors:Pablo Figueroa, Mark Green, Benjamin Watson
Abstract:
This paper presents a software architecture for 3D interaction techniques (ITs) and an object oriented, toolkit-independent framework that implements such architecture. ITs are composed of basic filters connected in a dataflow, where virtual input devices and objects in the scene are sources of information. An execution model defines the general flow of information between filters. This framework has been designed to be extensible: new information types, new input devices, new execution models, or new interaction techniques can easily be added. Application specific code and application specific ITs are seamlessly integrated into this architecture.
Authors:Benjamin Watson, Janet Kim, Tim McEneany, Tom Moher, Claudia Hindo, Louis Gomez, Stephen Fransen
Abstract:
The StorySpace project studies the role new interface technologies might play in high school education. With this approach in mind, StorySpace is specifically designed to support and enhance classroom narrative, an already well-established classroom activity. StorySpace strives to achieve this through adherence to three design goals. The first is to trigger student reflection and interpretation. The narrative medium created by StorySpace should represent the topic of classroom discussion and learning in all its complexity. In building their representation, the students will then be confronted with that same complexity. The medium should also itself be exciting and compelling, making classroom narrative interesting and fun.
Authors:Kathy Zhuang, Zixun Huang, Yukun Song, Rui Li, Yinuo Zhou, Allen Y. Yang
Abstract:
As modern computing advances, new interaction paradigms have emerged, particularly in Augmented Reality (AR), which overlays virtual interfaces onto physical objects. This evolution poses challenges in machine perception, especially for tasks like 3D object pose estimation in complex, dynamic environments. Our project addresses critical issues in human-robot interaction within mobile AR, focusing on non-intrusive, spatially aware interfaces. We present URSA, an LLM-driven immersive AR system developed for NASA's 2023-2024 SUITS challenge, targeting future spaceflight needs such as the Artemis missions. URSA integrates three core technologies: a head-mounted AR device (e.g., HoloLens) for intuitive visual feedback, voice control powered by large language models for hands-free interaction, and robot tracking algorithms that enable accurate 3D localization in dynamic settings. To enhance precision, we leverage digital twin localization technologies, using datasets like DTTD-Mobile and specialized hardware such as the ZED2 camera for real-world tracking under noise and occlusion. Our system enables real-time robot control and monitoring via an AR interface, even in the absence of ground-truth sensors--vital for hazardous or remote operations. Key contributions include: (1) a non-intrusive AR interface with LLM-based voice input; (2) a ZED2-based dataset tailored for non-rigid robotic bodies; (3) a Local Mission Control Console (LMCC) for mission visualization; (4) a transformer-based 6DoF pose estimator (DTTDNet) optimized for depth fusion and real-time tracking; and (5) end-to-end integration for astronaut mission support. This work advances digital twin applications in robotics, offering scalable solutions for both aerospace and industrial domains.
Authors:Hafsah Mahzabin Chowdhury, Sharifa Sultana
Abstract:
Reproductive well-being is shaped by intersecting cultural, religious, gendered, and political contexts, yet current technologies often reflect narrow, Western-centric assumptions. In this literature review, we synthesize findings from 147 peer-reviewed papers published between 2015 and 2025 across HCI, CSCW and social computing, ICTD, digital and public health, and AI for well-being scholarship to map the evolving reproductive well-being landscape. We identify three thematic waves that focused on early access and education, cultural sensitivity and privacy, and AI integration with policy-aware design, and highlight how technologies support or constrain diverse reproductive experiences. Our analysis reveals critical gaps in inclusivity, with persistent exclusions of men and non-binary users, migrants, and users in the Global South. Additionally, we surfaced the significant absence of literature on the role of stakeholders (e.g., husband and family members, household maids and cleaning helping hands, midwife, etc.) in the reproductive well-being space. Drawing on the findings from the literature, we propose the ReWA framework to support reproductive well-being for all agendas through six design orientations associated with: location, culture, and history; polyvocality and agency; rationality, temporality, distributive roles, and methodology.
Authors:Shayan Dadman, Bernt Arild Bremdal, Andreas Bergsland
Abstract:
This study presents an exploratory evaluation of Music Generation Systems (MGS) within contemporary music production workflows by examining eight open-source systems. The evaluation framework combines technical insights with practical experimentation through criteria specifically designed to investigate the practical and creative affordances of the systems within the iterative, non-linear nature of music production. Employing a single-evaluator methodology as a preliminary phase, this research adopts a mixed approach utilizing qualitative methods to form hypotheses subsequently assessed through quantitative metrics. The selected systems represent architectural diversity across both symbolic and audio-based music generation approaches, spanning composition, arrangement, and sound design tasks. The investigation addresses limitations of current MGS in music production, challenges and opportunities for workflow integration, and development potential as collaborative tools while maintaining artistic authenticity. Findings reveal these systems function primarily as complementary tools enhancing rather than replacing human expertise. They exhibit limitations in maintaining thematic and structural coherence that emphasize the indispensable role of human creativity in tasks demanding emotional depth and complex decision-making. This study contributes a structured evaluation framework that considers the iterative nature of music creation. It identifies methodological refinements necessary for subsequent comprehensive evaluations and determines viable areas for AI integration as collaborative tools in creative workflows. The research provides empirically-grounded insights to guide future development in the field.
Authors:Linhao Meng, Stef van den Elzen, Anna Vilanova
Abstract:
Traditional instance-based model analysis focuses mainly on misclassified instances. However, this approach overlooks the varying difficulty associated with different instances. Ideally, a robust model should recognize and reflect the challenges presented by intrinsically difficult instances. It is also valuable to investigate whether the difficulty perceived by the model aligns with that perceived by humans. To address this, we propose incorporating instance difficulty into the deep neural network evaluation process, specifically for supervised classification tasks on image data. Specifically, we consider difficulty measures from three perspectives -- data, model, and human -- to facilitate comprehensive evaluation and comparison. Additionally, we develop an interactive visual tool, DifficultyEyes, to support the identification of instances of interest based on various difficulty patterns and to aid in analyzing potential data or model issues. Case studies demonstrate the effectiveness of our approach.
Authors:Xi Xuan, King-kui Sin, Yufei Zhou, Chunyu Kit
Abstract:
Multi-agent systems empowered by large language models (LLMs) have demonstrated remarkable capabilities in a wide range of downstream applications, including machine translation. However, the potential of LLMs in translating Hong Kong legal judgments remains uncertain due to challenges such as intricate legal terminology, culturally embedded nuances, and strict linguistic structures. In this work, we introduce TransLaw, a novel multi-agent framework implemented for real-world Hong Kong case law translation. It employs three specialized agents, namely, Translator, Annotator, and Proofreader, to collaboratively produce translations for high accuracy in legal meaning, appropriateness in style, and adequate coherence and cohesion in structure. This framework supports customizable LLM configurations and achieves tremendous cost reduction compared to professional human translation services. We evaluated its performance using 13 open-source and commercial LLMs as agents and obtained interesting findings, including that it surpasses GPT-4o in legal semantic accuracy, structural coherence, and stylistic fidelity, yet trails human experts in contextualizing complex terminology and stylistic naturalness. Our platform website is available at CityUHK, and our bilingual judgment corpus used for the evaluation is available at Hugging Face.
Authors:Mihnea Stefan Calota, Wessel Nieuwenhuys, Janet Yi-Ching Huang, Lin-Lin Chen, Mathias Funk
Abstract:
Designers have ample opportunities to impact the healthcare domain. However, hospitals are often closed ecosystems that pose challenges in engaging clinical stakeholders, developing domain knowledge, and accessing relevant systems and data. In this paper, we introduce a making-oriented approach to help designers understand the intricacies of their target healthcare context. Using Remote Patient Monitoring (RPM) as a case study, we explore how manually crafting synthetic datasets based on real-world observations enables designers to learn about complex data-driven healthcare systems. Our process involves observing and modeling the real-world RPM context, crafting synthetic datasets, and iteratively prototyping a simplified RPM system that balances contextual richness and intentional abstraction. Through this iterative process of sensemaking through making, designers can still develop context familiarity when direct access to the actual healthcare system is limited. Our approach emphasizes the value of hands-on interaction with data structures to support designers in understanding opaque healthcare systems.
Authors:Emin Zerman, Jonas Carlsson, Mårten Sjöström
Abstract:
Marksmanship practices are required in various professions, including police, military personnel, hunters, as well as sports shooters, such as Olympic shooting, biathlon, and modern pentathlon. The current form of training and coaching is mostly based on repetition, where the coach does not see through the eyes of the shooter, and analysis is limited to stance and accuracy post-session. In this study, we present a shooting visualization system and evaluate its perceived effectiveness for both novice and expert shooters. To achieve this, five composite visualizations were developed using first-person shooting video recordings enriched with overlaid metrics and graphical summaries. These views were evaluated with 10 participants (5 expert marksmen, 5 novices) through a mixed-methods study including shot-count and aiming interpretation tasks, pairwise preference comparisons, and semi-structured interviews. The results show that a dashboard-style composite view, combining raw video with a polar plot and selected graphs, was preferred in 9 of 10 cases and supported understanding across skill levels. The insights gained from this design study point to the broader value of integrating first-person video with visual analytics for coaching, and we suggest directions for applying this approach to other precision-based sports.
Authors:Changsoo Jung, Sheikh Mannan, Jack Fitzgerald, Nathaniel Blanchard
Abstract:
Interactive and spatially aware technologies are transforming educational frameworks, particularly in K-12 settings where hands-on exploration fosters deeper conceptual understanding. However, during collaborative tasks, existing systems often lack the ability to accurately capture real-world interactions between students and physical objects. This issue could be addressed with automatic 6D pose estimation, i.e., estimation of an object's position and orientation in 3D space from RGB images or videos. For collaborative groups that interact with physical objects, 6D pose estimates allow AI systems to relate objects and entities. As part of this work, we introduce FiboSB, a novel and challenging 6D pose video dataset featuring groups of three participants solving an interactive task featuring small hand-held cubes and a weight scale. This setup poses unique challenges for 6D pose because groups are holistically recorded from a distance in order to capture all participants -- this, coupled with the small size of the cubes, makes 6D pose estimation inherently non-trivial. We evaluated four state-of-the-art 6D pose estimation methods on FiboSB, exposing the limitations of current algorithms on collaborative group work. An error analysis of these methods reveals that the 6D pose methods' object detection modules fail. We address this by fine-tuning YOLO11-x for FiboSB, achieving an overall mAP_50 of 0.898. The dataset, benchmark results, and analysis of YOLO11-x errors presented here lay the groundwork for leveraging the estimation of 6D poses in difficult collaborative contexts.
Authors:Ji Hwan Park, Braden Roper, Amirhossein Arezoumand, Tien Tran
Abstract:
We investigate methods for placing labels in AR environments that have visually cluttered scenes. As the number of items increases in a scene within the user' FOV, it is challenging to effectively place labels based on existing label placement guidelines. To address this issue, we implemented three label placement techniques for in-view objects for AR applications. We specifically target a scenario, where various items of different types are scattered within the user's field of view, and multiple items of the same type are situated close together. We evaluate three placement techniques for three target tasks. Our study shows that using a label to spatially group the same types of items is beneficial for identifying, comparing, and summarizing data.
Authors:Matthias Huemmer, Theophile Shyiramunda, Michelle J. Cummings-Koether
Abstract:
This paper proposes a rigorous framework to examine the two-way relationship between artificial intelligence (AI), human cognition, problem-solving, and cultural adaptation across academic and business settings. It addresses a key gap by asking how AI reshapes cognitive processes and organizational norms, and how cultural values and institutional contexts shape AI adoption, trust, and use over time. We employ a three-wave longitudinal design that tracks AI knowledge, perceived competence, trust trajectories, and cultural responses. Participants span academic institutions and diverse firms, enabling contextual comparison. A dynamic sample continuous, intermittent, and wave-specific respondents mirrors real organizational variability and strengthens ecological validity. Methodologically, the study integrates quantitative longitudinal modeling with qualitative thematic analysis to capture temporal, structural, and cultural patterns in AI uptake. We trace AI acculturation through phases of initial resistance, exploratory adoption, and cultural embedding, revealing distinctive trust curves and problem-solving strategies by context: academic environments tend to collaborative, deliberative integration; business environments prioritize performance, speed, and measurable outcomes. Framing adoption as bidirectional challenges deterministic views: AI both reflects and reconfigures norms, decision-making, and cognitive engagement. As the first comparative longitudinal study of its kind, this work advances methodological rigor and offers actionable foundations for human-centred, culturally responsive AI strategies-supporting evidence-based policies, training, and governance that align cognitive performance, organizational goals, and ethical commitments.
Authors:Shih-Chieh Sun, Yun-Cheng Tsai
Abstract:
This paper presents an AI-driven IoT robotic teleoperation system designed for real-time remote manipulation and intelligent visual monitoring, tailored for smart city applications. The architecture integrates a Flutter-based cross-platform mobile interface with MQTT-based control signaling and WebRTC video streaming via the LiveKit framework. A YOLOv11-nano model is deployed for lightweight object detection, enabling real-time perception with annotated visual overlays delivered to the user interface. Control commands are transmitted via MQTT to an ESP8266-based actuator node, which coordinates multi-axis robotic arm motion through an Arduino Mega2560 controller. The backend infrastructure is hosted on DigitalOcean, ensuring scalable cloud orchestration and stable global communication. Latency evaluations conducted under both local and international VPN scenarios (including Hong Kong, Japan, and Belgium) demonstrate actuator response times as low as 0.2 seconds and total video latency under 1.2 seconds, even across high-latency networks. This low-latency dual-protocol design ensures responsive closed-loop interaction and robust performance in distributed environments. Unlike conventional teleoperation platforms, the proposed system emphasizes modular deployment, real-time AI sensing, and adaptable communication strategies, making it well-suited for smart city scenarios such as remote infrastructure inspection, public equipment servicing, and urban automation. Future enhancements will focus on edge-device deployment, adaptive routing, and integration with city-scale IoT networks to enhance resilience and scalability.
Authors:Sheng Jin, Haoming Wang, Zhiqi Gao, Yongbo Yang, Bao Chunjia, Chengliang Wang
Abstract:
Large language models (LLMs) based Agents are increasingly pivotal in simulating and understanding complex human systems and interactions. We propose the AI-Agent School (AAS) system, built around a self-evolving mechanism that leverages agents for simulating complex educational dynamics. Addressing the fragmented issues in teaching process modeling and the limitations of agents performance in simulating diverse educational participants, AAS constructs the Zero-Exp strategy, employs a continuous "experience-reflection-optimization" cycle, grounded in a dual memory base comprising experience and knowledge bases and incorporating short-term and long-term memory components. Through this mechanism, agents autonomously evolve via situated interactions within diverse simulated school scenarios. This evolution enables agents to more accurately model the nuanced, multi-faceted teacher-student engagements and underlying learning processes found in physical schools. Experiment confirms that AAS can effectively simulate intricate educational dynamics and is effective in fostering advanced agent cognitive abilities, providing a foundational stepping stone from the "Era of Experience" to the "Era of Simulation" by generating high-fidelity behavioral and interaction data.
Authors:Harin Yoon, Dongwhan Kim, Changhoon Oh, Soojin Jun
Abstract:
In recent years, discussions on integrating Artificial Intelligence (AI) into UX design have intensified. However, the practical application of AI tools in design is limited by their operation within overly simplified scenarios, inherent complexity and unpredictability, and a general lack of relevant education. This study proposes an effective UXer-AI collaboration process to address these issues and seeks to identify efficient AI collaboration strategies through a series of user studies. In a preliminary study, two participatory design workshops identified major barriers to UXer-AI collaboration, including unfamiliarity with AI, inadequate internal support, and trust issues. To address the particularly critical issue of diminished trust, this study developed a new AI prototype model, TW-AI, that incorporates verification and decision-making processes to enhance trust and operational efficiency in UX design tasks. Task performance experiments and in-depth interviews evaluated the TW-AI model, revealing significant improvements in practitioners' trust, work efficiency, understanding of usage timing, and controllability. The "Source" function, based on Retrieval-Augmented Generation (RAG) technology, notably enhanced the reliability of the AI tool. Participants noted improved communication efficiency and reduced decision-making time, attributing these outcomes to the model's comprehensive verification features and streamlined approach to complex verification tasks. This study advances UXer-AI collaboration by providing key insights, bridging research and practice with actionable strategies, and establishing guidelines for AI tool designs tailored to UX. It contributes to the HCI community by outlining a scalable UXer-AI collaboration framework that addresses immediate operational challenges and lays the foundation for future advancements in AI-driven UX methodologies.
Authors:Soraya S. Anvari, Rina R. Wehbe
Abstract:
Large Language Models (LLMs) are increasingly deployed in mental health contexts, from structured therapeutic support tools to informal chat-based well-being assistants. While these systems increase accessibility, scalability, and personalization, their integration into mental health care brings privacy and safety challenges that have not been well-examined. Unlike traditional clinical interactions, LLM-mediated therapy often lacks a clear structure for what information is collected, how it is processed, and how it is stored or reused. Users without clinical guidance may over-disclose personal information, which is sometimes irrelevant to their presenting concern, due to misplaced trust, lack of awareness of data risks, or the conversational design of the system. This overexposure raises privacy concerns and also increases the potential for LLM bias, misinterpretation, and long-term data misuse. We propose a framework embedding Artificial Intelligence (AI) literacy interventions directly into mental health conversational systems, and outline a study plan to evaluate their impact on disclosure safety, trust, and user experience.
Authors:Jaroslaw Domaszewicz, Damian Sienicki, Michal Obirek
Abstract:
Excessive smartphone use is now widely considered a personal and societal problem. It is recognized by application and smartphone makers, who provide tools to track the amount of use, set limits, or block certain services at predefined times. These tools, while powerful, may require significant cognitive effort to operate: configuration parameters need to be set, and captured statistics need to be analyzed. To offer a complementary solution, we propose a radically different approach. We employ the keyboard of a smartphone as an output device. With each press of a key, the user is given a high-level, qualitative, color-encoded estimate of the amount of recent smartphone use. The technique, dubbed the informative keyboard, is a case of implicit interaction: the user's intention is to enter text but, while typing, they receive the feedback. In the paper, we elaborate the concept, identify design decisions, describe our implementation, present the outcome of a questionnaire-based evaluation, and point to some other applications of the informative keyboard.
Authors:Ryota Takamidoa, Chiharu Suzukia, Hiroki Nakamoto
Abstract:
A critical challenge in contemporary sports science lies in filling the gap between group-level insights derived from controlled hypothesis-driven experiments and the real-world need for personalized coaching tailored to individual athletes' unique movement patterns. This study developed a Personalized Motion Guidance Framework (PMGF) to enhance athletic performance by generating individualized motion-refinement guides using generative artificial intelligence techniques. PMGF leverages a vertical autoencoder to encode motion sequences into athlete-specific latent representations, which can then be directly manipulated to generate meaningful guidance motions. Two manipulation strategies were explored: (1) smooth interpolation between the learner's motion and a target (e.g., expert) motion to facilitate observational learning, and (2) shifting the motion pattern in an optimal direction in the latent space using a local optimization technique. The results of the validation experiment with data from 51 baseball pitchers revealed that (1) PMGF successfully generated smooth transitions in motion patterns between individuals across all 1,275 pitcher pairs, and (2) the features significantly altered through PMGF manipulations reflected known performance-enhancing characteristics, such as increased stride length and knee extension associated with higher ball velocity, indicating that PMGF induces biomechanically plausible improvements. We propose a future extension called general-PMGF to enhance the applicability of this framework. This extension incorporates bodily, environmental, and task constraints into the generation process, aiming to provide more realistic and versatile guidance across diverse sports contexts.
Authors:Moona Kanwal, Muhammad Sami Siddiqui, Syed Anael Ali
Abstract:
Profiling gamers provides critical insights for adaptive game design, behavioral understanding, and digital well-being. This study proposes an integrated, data-driven framework that combines psychological measures, behavioral analytics, and machine learning to reveal underlying gamer personas. A structured survey of 250 participants, including 113 active gamers, captured multidimensional behavioral, motivational, and social data. The analysis pipeline integrated feature engineering, association-network, knowledge-graph analysis, and unsupervised clustering to extract meaningful patterns. Correlation statistics uses Cramers V, Tschuprows T, Theils U, and Spearmans quantified feature associations, and network centrality guided feature selection. Dimensionality-reduction techniques such as PCA, SVD, t-SNE are coupled with clustering algorithms like K-Means, Agglomerative, Spectral, DBSCAN, evaluated using Silhouette, Calinski Harabasz, and Davies Bouldin indices. The PCA with K-Means with k = 4 model achieved optimal cluster quality with Silhouette = 0.4, identifying four archetypes as Immersive Social Story-Seekers, Disciplined Optimizers, Strategic Systems Navigators, and Competitive Team-Builders. This research contributes a reproducible pipeline that links correlation-driven network insights with unsupervised learning. The integration of behavioral correlation networks with clustering not only enhances classification accuracy but also offers a holistic lens to connect gameplay motivations with psychological and wellness outcomes.
Authors:Shuai Guo, Dawei Liu, Tiantian Zheng
Abstract:
This paper critiques the limits of human-centered design in HCI, proposing a shift toward Interface-Centered Design. Drawing on Hookway's philosophy of interfaces, phenomenology, and embodied interaction, we created Umbilink, an umbilical interaction device simulating a uterine environment with tactile sensors and rhythmic feedback to induce a pre-subjectivized state of sensory reduction. Participants' experiences were captured through semi-structured interviews and analyzed with grounded theory. Our contributions are: (1) introducing the novel interface type of Umbilical Interaction; (2) demonstrating the cognitive value of materialized interfaces in a human-interface-environment relation; (3) highlighting the design role of wearing rituals as liminal experiences. As a pilot study, this design suggests imaginative applications in healing, meditation, and sleep, while offering a speculative tool for future interface research.
Authors:Angel Hsing-Chi Hwang, Fiona Li, Jacy Reese Anthis, Hayoun Noh
Abstract:
The quickly growing popularity of AI companions poses risks to mental health, personal wellbeing, and social relationships. Past work has identified many individual factors that can drive human-companion interaction, but we know little about how these factors interact and evolve over time. In Study 1, we surveyed AI companion users (N = 303) to map the psychological pathway from users' mental models of the agent to parasocial experiences, social interaction, and the psychological impact of AI companions. Participants' responses foregrounded multiple interconnected variables (agency, parasocial interaction, and engagement) that shape AI companionship. In Study 2, we conducted a longitudinal study with a subset of participants (N = 110) using a new generic chatbot. Participants' perceptions of the generic chatbot significantly converged to perceptions of their own companions by Week 3. These results suggest a longitudinal model of AI companionship development and demonstrate an empirical method to study human-AI companionship.
Authors:Yibo Meng, Ruiqi Chen, Zhiming Liu, Xiaolan Ding, Yan Guan
Abstract:
Generative AI systems are increasingly adopted by patients seeking everyday health guidance, yet their reliability and clinical appropriateness remain uncertain. Taking Type 2 Diabetes Mellitus (T2DM) as a representative chronic condition, this paper presents a two-part mixed-methods study that examines how patients and physicians in China evaluate the quality and usability of AI-generated health information. Study~1 analyzes 784 authentic patient questions to identify seven core categories of informational needs and five evaluation dimensions -- \textit{Accuracy, Safety, Clarity, Integrity}, and \textit{Action Orientation}. Study~2 involves seven endocrinologists who assess responses from four mainstream AI models across these dimensions. Quantitative and qualitative findings reveal consistent strengths in factual and lifestyle guidance but significant weaknesses in medication interpretation, contextual reasoning, and empathy. Patients view AI as an accessible ``pre-visit educator,'' whereas clinicians highlight its lack of clinical safety and personalization. Together, the findings inform design implications for interactive health systems, advocating for multi-model orchestration, risk-aware fallback mechanisms, and emotionally attuned communication to ensure trustworthy AI assistance in chronic disease care.
Authors:Margarete Jahrmann, Thomas Brandstetter, Stefan Glasauer
Abstract:
The paper presents the first results of an artistic research project investigating how Large Language Models (LLMs) curate and present collective memory. In a public installation exhibited during two months in Vienna in 2025, visitors could interact with five different LLMs (ChatGPT with GPT 4o and GPT 4o mini, Mistral Large, DeepSeek-Chat, and a locally run Llama 3.1 model), which were instructed to act as narrators, implementing a role-playing game revolving around the murder of Austrian philosopher Moritz Schlick in 1936. Results of the investigation include protocols of LLM-user interactions during the game and qualitative conversations after the play experience to get insight into the players' reactions to the game. In a quantitative analysis 115 introductory texts for role-playing generated by the LLMs were examined by different methods of natural language processing, including semantic similarity and sentiment analysis. While the qualitative player feedback allowed to distinguish three distinct types of users, the quantitative text analysis showed significant differences between how the different LLMs presented the historical content. Our study thus adds to ongoing efforts to analyse LLM performance, but also suggests a way of how these efforts can be disseminated in a playful way to a general audience.
Authors:Donghan Hu, Rameen Mahmood, Annabelle David, Danny Yuxing Huang
Abstract:
AI-driven applications have become woven into students' academic and creative workflows, influencing how they learn, write, and produce ideas. Gaining a nuanced understanding of these usage patterns is essential, yet conventional survey and interview methods remain limited by recall bias, self-presentation effects, and the underreporting of habitual behaviors. While ethnographic methods offer richer contextual insights, they often face challenges of scale and reproducibility. To bridge this gap, we introduce a privacy-conscious approach that repurposes VPN-based network traffic analysis as a scalable ethnographic technique for examining students' real-world engagement with AI tools. By capturing anonymized metadata rather than content, this method enables fine-grained behavioral tracing while safeguarding personal information, thereby complementing self-report data. A three-week field deployment with university students reveals fragmented, short-duration interactions across multiple tools and devices, with intense bursts of activity coinciding with exam periods-patterns mirroring institutional rhythms of academic life. We conclude by discussing methodological, ethical, and empirical implications, positioning network traffic analysis as a promising avenue for large-scale digital ethnography on technology-in-practice.
Authors:Ayoub Bouguettaya, Elizabeth M. Stuart
Abstract:
The lexical hypothesis posits that personality traits are encoded in language and is foundational to models like the Big Five. We created a bottom-up personality model from a classic adjective list using machine learning and compared its descriptive utility against the Big Five by analyzing one million Reddit comments. The Big Five, particularly Agreeableness, Conscientiousness, and Neuroticism, provided a far more powerful and interpretable description of these online communities. In contrast, our machine-learning clusters provided no meaningful distinctions, failed to recover the Extraversion trait, and lacked the psychometric coherence of the Big Five. These results affirm the robustness of the Big Five and suggest personality's semantic structure is context-dependent. Our findings show that while machine learning can help check the ecological validity of established psychological theories, it may not be able to replace them.
Authors:Nishant Gautam, Somya Sharma, Peter Corcoran, Kaspar Althoefer
Abstract:
Pseudo-haptics exploit carefully crafted visual or auditory cues to trick the brain into "feeling" forces that are never physically applied, offering a low-cost alternative to traditional haptic hardware. Here, we present a comparative psychophysical study that quantifies how visual and auditory stimuli combine to evoke pseudo-haptic pressure sensations on a commodity tablet. Using a Unity-based Rollball game, participants (n = 4) guided a virtual ball across three textured terrains while their finger forces were captured in real time with a Robotous RFT40 force-torque sensor. Each terrain was paired with a distinct rolling-sound profile spanning 440 Hz - 4.7 kHz, 440 Hz - 13.1 kHz, or 440 Hz - 8.9 kHz; crevice collisions triggered additional "knocking" bursts to heighten realism. Average tactile forces increased systematically with cue intensity: 0.40 N, 0.79 N and 0.88 N for visual-only trials and 0.41 N, 0.81 N and 0.90 N for audio-only trials on Terrains 1-3, respectively. Higher audio frequencies and denser visual textures both elicited stronger muscle activation, and their combination further reduced the force needed to perceive surface changes, confirming multisensory integration. These results demonstrate that consumer-grade isometric devices can reliably induce and measure graded pseudo-haptic feedback without specialized actuators, opening a path toward affordable rehabilitation tools, training simulators and assistive interfaces.
Authors:Trevor DePodesta, Johanna Beyer
Abstract:
Existing digital book management platforms often fail to capture the rich spatial and visual cues inherent to physical bookshelves, hindering users' ability to fully engage with their collections. We present LibraryLens, a novel visualization tool that addresses these shortcomings by enabling users to create, explore, and interact with immersive, two-dimensional representations of their personal libraries. The tool also caters to the growing trend of social sharing within online book communities, allowing users to create visually appealing representations of their libraries that can be easily shared on social platforms. Despite limitations inherent to the metadata being rendered, formative evaluations suggest that LibraryLens has the potential to lower the barrier to entry for users seeking to optimize their book organization without the constraints of physical space or manual labor, ultimately fostering deeper engagement with their personal libraries.
Authors:Sam Lau, Kianoosh Boroojeni, Harry Keeling, Jenn Marroquin
Abstract:
Generative AI (GenAI) tools are increasingly pervasive, pushing instructors to redesign how students use GenAI tools in coursework. We conceptualize this work as emergency pedagogical design: reactive, indirect efforts by instructors to shape student-AI interactions without control over commercial interfaces. To understand practices of lead users conducting emergency pedagogical design, we conducted interviews (n=13) and a survey (n=169) of computing instructors. These instructors repeatedly encountered five barriers: fragmented buy-in for revising courses; policy crosswinds from non-prescriptive institutional guidance; implementation challenges as instructors attempt interventions; assessment misfit as student-AI interactions are only partially visible to instructors; and lack of resources, including time, staffing, and paid tool access. We use these findings to present emergency pedagogical design as a distinct design setting for HCI and outline recommendations for HCI researchers, academic institutions, and organizations to effectively support instructors in adapting courses to GenAI.
Authors:Brandon Lit, Edward Crowder, Daniel Vogel, Hassan Khan
Abstract:
AI chatbots are an emerging security attack vector, vulnerable to threats such as prompt injection, and rogue chatbot creation. When deployed in domains such as corporate security policy, they could be weaponized to deliver guidance that intentionally undermines system defenses. We investigate whether users can be tricked by a compromised AI chatbot in this scenario. A controlled study (N=15) asked participants to use a chatbot to complete security-related tasks. Without their knowledge, the chatbot was manipulated to give incorrect advice for some tasks. The results show how trust in AI chatbots is related to task familiarity, and confidence in their ownn judgment. Additionally, we discuss possible reasons why people do or do not trust AI chatbots in different scenarios.
Authors:Jijie Zhou, Yuhan Hu
Abstract:
Recently, large language models have facilitated the emergence of highly intelligent conversational AI capable of engaging in human-like dialogues. However, a notable distinction lies in the fact that these AI models predominantly generate responses rapidly, often producing extensive content without emulating the thoughtful process characteristic of human cognition and typing. This paper presents a design aimed at simulating human-like typing behaviors, including patterns such as hesitation and self-editing, as well as a preliminary user experiment to understand whether and to what extent the agent with human-like typing behaviors could potentially affect conversational engagement and its trustworthiness. We've constructed an interactive platform featuring user-adjustable parameters, allowing users to personalize the AI's communication style and thus cultivate a more enriching and immersive conversational experience. Our user experiment, involving interactions with three types of agents - a baseline agent, one simulating hesitation, and another integrating both hesitation and self-editing behaviors - reveals a preference for the agent that incorporates both behaviors, suggesting an improvement in perceived naturalness and trustworthiness. Through the insights from our design process and both quantitative and qualitative feedback from user experiments, this paper contributes to the multimodal interaction design and user experience for conversational AI, advocating for a more human-like, engaging, and trustworthy communication paradigm.
Authors:Ruijie Wang, Jie Lu, Bo Pei, Evonne Jones, Jamey Brinson, Timothy Brown
Abstract:
Interprofessional education has long relied on case studies and the use of standardized patients to support teamwork, communication, and related collaborative competencies among healthcare professionals. However, traditional approaches are often limited by cost, scalability, and inability to mimic the dynamic complexity of real-world clinical scenarios. To address these challenges, we designed and developed AIMS (AI-Enhanced Immersive Multidisciplinary Simulations), a virtual simulation that integrates a large language model (Gemini-2.5-Flash), a Unity-based virtual environment engine, and a character creation pipeline to support synchronized, multimodal interactions between the user and the virtual patient. AIMS was designed to enhance collaborative clinical reasoning and health promotion competencies among students from pharmacy, medicine, nursing, and social work. A formal usability testing session was conducted which participants assumed professional roles on a healthcare team and engaged in a mix of scripted and unscripted conversations. Participants explored the patient's symptoms, social context, and care needs. Usability issues were identified (e.g., audio routing, response latency) and used to guide subsequent refinements. Findings in general suggest that AIMS supports realistic, profession-specific and contextually appropriate conversations. We discussed both technical and pedagogical innovations of AIMS and concluded with future directions.
Authors:Yashodip Dharmendra Jagtap, Aaditya Ganesh Bagul
Abstract:
Electronic waste (e-waste) is a rapidly growing global problem caused by shorter device lifecycles and rising consumption. India ranks third globally in e-waste generation, producing over 1.7 million tonnes in 2023-24, of which less than half is formally processed. To address this, we propose Green Grid, an integrated AI-powered e-waste management platform combining IoT-enabled smart collection, AI-based device classification, blockchain-based traceability, and gamified citizen engagement. The system features smart recycling bins with sensors for real-time monitoring, deep learning models for device identification and sorting, a blockchain ledger for tamper-proof tracking, and a reward-based mobile or web app to encourage user participation. Additionally, Green Grid offers analytics dashboards and an eco-marketplace to support policymakers and recyclers. By bridging technology, sustainability, and community participation, the platform enhances transparency, increases formal recycling rates, and advances India's transition toward a circular economy.
Authors:Wouter Haverals, Meredith Martin
Abstract:
As AI writing tools become widespread, we need to understand how both humans and machines evaluate literary style, a domain where objective standards are elusive and judgments are inherently subjective. We conducted controlled experiments using Raymond Queneau's Exercises in Style (1947) to measure attribution bias across evaluators. Study 1 compared human participants (N=556) and AI models (N=13) evaluating literary passages from Queneau versus GPT-4-generated versions under three conditions: blind, accurately labeled, and counterfactually labeled. Study 2 tested bias generalization across a 14$\times$14 matrix of AI evaluators and creators. Both studies revealed systematic pro-human attribution bias. Humans showed +13.7 percentage point (pp) bias (Cohen's h = 0.28, 95% CI: 0.21-0.34), while AI models showed +34.3 percentage point bias (h = 0.70, 95% CI: 0.65-0.76), a 2.5-fold stronger effect (P$<$0.001). Study 2 confirmed this bias operates across AI architectures (+25.8pp, 95% CI: 24.1-27.6%), demonstrating that AI systems systematically devalue creative content when labeled as "AI-generated" regardless of which AI created it. We also find that attribution labels cause evaluators to invert assessment criteria, with identical features receiving opposing evaluations based solely on perceived authorship. This suggests AI models have absorbed human cultural biases against artificial creativity during training. Our study represents the first controlled comparison of attribution bias between human and artificial evaluators in aesthetic judgment, revealing that AI systems not only replicate but amplify this human tendency.
Authors:Jie Zeng, Yukiko I. Nakano
Abstract:
The primary goal of Motivational Interviewing (MI) is to help clients build their own motivation for behavioral change. To support this in dialogue systems, it is essential to guide large language models (LLMs) to generate counselor responses aligned with MI principles. By employing a schema-guided approach, this study proposes a method for updating multi-frame dialogue states and a strategy decision mechanism that dynamically determines the response focus in a manner grounded in MI principles. The proposed method was implemented in a dialogue system and evaluated through a user study. Results showed that the proposed system successfully generated MI-favorable responses and effectively encouraged the user's (client's) deliberation by asking eliciting questions.
Authors:Seokho Jin, Manseo Kim, Sungho Byun, Hansol Kim, Jungmin Lee, Sujeong Baek, Semi Kim, Sanghum Park, Sung Park
Abstract:
Reflective journaling often lacks personalization and fails to engage Generation Alpha and Z, who prefer visually immersive and fast-paced interactions over traditional text-heavy methods. Visual storytelling enhances emotional recall and offers an engaging way to process personal expe- riences. Designed with these digital-native generations in mind, this paper introduces Persode, a journaling system that integrates personalized onboarding, memory-aware conversational agents, and automated visual storytelling. Persode captures user demographics and stylistic preferences through a tailored onboarding process, ensuring outputs resonate with individual identities. Using a Retrieval-Augmented Generation (RAG) framework, it prioritizes emotionally significant memories to provide meaningful, context-rich interactions. Additionally, Persode dynamically transforms user experiences into visually engaging narratives by generating prompts for advanced text-to-image models, adapting characters, backgrounds, and styles to user preferences. By addressing the need for personalization, visual engagement, and responsiveness, Persode bridges the gap between traditional journaling and the evolving preferences of Gen Alpha and Z.
Authors:Shruti Rao, Judith Good, Hamed Alavi
Abstract:
The perspectives of affective interaction in built environments are largely overlooked and instead dominated by affective computing approaches that view emotions as "static", computable states to be detected and regulated. To address this limitation, we interviewed architects to explore how biophilic design -- our deep-rooted emotional connection with nature -- could shape affective interaction design in smart buildings. Our findings reveal that natural environments facilitate self-directed emotional experiences through spatial diversity, embodied friction, and porous sensory exchanges. Based on this, we introduce three design principles for discussion at the Affective Interaction workshop: (1) Diversity of Spatial Experiences, (2) Self-Reflection Through Complexity & Friction, and (3) Permeability & Sensory Exchange with the Outside World, while also examining the challenges of integrating these perspectives into built environments.
Authors:Tom Roy, Yann Glemarec, Gurvan Lecuyer, Quentin Galvane, Philippe Guillotel, Ferran Argelaguet
Abstract:
Haptic technology enhances interactive experiences by providing force and tactile feedback, improving user performance and immersion. However, despite advancements, creating tactile experiences still remains challenging due to device diversity and complexity. Most available haptic frameworks rely on trigger-based or event-based systems, and disregard the information of the 3D scene to render haptic information. This paper introduces Haptic Tracing, a novel method for spatial haptic rendering that simplifies the creation of interactive haptic experiences without relying on physical simulations. It uses concepts from visual and audio rendering to model and propagate haptic information through a 3D scene. The paper also describes how our proposed haptic rendering method can be used to create a vibrotactile rendering system, enabling the creation of perceptually coherent and dynamic haptic interactions. Finally, the paper discusses a user study that explores the role of the haptic propagation and multi-actuator rendering on the users' haptic experience. The results show that our approach significantly enhances the realism and the expressivity of the haptic feedback, showcasing its potential for developing more complex and realistic haptic experiences.
Authors:K. K. Kim, S. P. McGrath, D. Lindeman
Abstract:
Chronic illnesses are a global concern with essential hypertension and diabetes mellitus among the most common conditions. Remote patient monitoring has shown promising results on clinical and health outcomes. However, access to care and digital health solutions is limited among rural, lower-income, and older adult populations. This paper repots on a pre-post study of a comprehensive care coordination program including connected, wearable blood pressure and glucometer devices, tablets, and medical assistant-provided health coaching in a community health center in rural California. The participants (n=221) had a mean age of 54.6 years, were majority female, two-thirds spoke Spanish, 19.9% had hypertension, 49.8% diabetic, and 30.3% both conditions. Participants with hypertension achieved a mean reduction in systolic blood pressure of 20.24 (95% CI: 13.61, 26.87) at six months while those with diabetes achieved a mean reduction of 3.85 points (95% CI: 3.73, 4.88). These outcomes compare favorably to the small but growing body of evidence supporting digital care coordination and remote monitoring. These results also support the feasibility of well-designed digital health solutions yielding improved health outcomes among underserved communities.
Authors:Alex Cuellar, Ho Chit Siu, Julie A Shah
Abstract:
As robots' manipulation capabilities improve for pick-and-place tasks (e.g., object packing, sorting, and kitting), methods focused on understanding human-acceptable object configurations remain limited expressively with regard to capturing spatial relationships important to humans. To advance robotic understanding of human rules for object arrangement, we introduce positionally-augmented RCC (PARCC), a formal logic framework based on region connection calculus (RCC) for describing the relative position of objects in space. Additionally, we introduce an inference algorithm for learning PARCC specifications via demonstrations. Finally, we present the results from a human study, which demonstrate our framework's ability to capture a human's intended specification and the benefits of learning from demonstration approaches over human-provided specifications.
Authors:Maximilian Frank, Simon Lund
Abstract:
Large Language Models have become widely adopted tools due to their versatile capabilities, yet their user interfaces remain limited, often following rigid, linear interaction paradigms. In this paper, we present insights from a design thinking workshop held at the deRSE25 conference aiming at collaboratively developing innovative user interface concepts for LLMs. During the workshop, participants identified common use cases, evaluated the strengths and shortcomings of current LLM interfaces, and created visualizations of new interaction concepts emphasizing flexible context management, dynamic conversation branching, and enhanced mechanisms for user control. We describe how these participant-generated ideas advanced our own whiteboard-based UI approach. The ongoing development of this interface is guided by the human-centered design process - an iterative, user-focused methodology that emphasizes continuous refinement through user feedback. Broader implications for future LLM interface development are discussed, advocating for increased attention to UI innovation grounded in user-centered design principles.
Authors:Hiroto Sakimura, Takayuki Nagaya, Tomoki Nishi, Tetsuo Kurahashi, Katsunori Kohda, Nobuhiko Muramoto
Abstract:
Estimating emotional states from physiological signals is a central topic in affective computing and psychophysiology. While many emotion estimation systems implicitly assume a stable relationship between physiological features and subjective affect, this assumption has rarely been tested over long timeframes. This study investigates whether such relationships remain consistent across several months within individuals. We developed a custom measurement system and constructed a longitudinal dataset by collecting physiological signals -- including blood volume pulse, electrodermal activity (EDA), skin temperature, and acceleration--along with self-reported emotional states from 24 participants over two three-month periods. Data were collected in naturalistic working environments, allowing analysis of the relationship between physiological features and subjective arousal in everyday contexts. We examined how physiological-arousal relationships evolve over time by using Explainable Boosting Machines (EBMs) to ensure model interpretability. A model trained on 1st-period data showed a 5\% decrease in accuracy when tested on 2nd-period data, indicating long-term variability in physiological-arousal associations. EBM-based comparisons further revealed that while heart rate remained a relatively stable predictor, minimum EDA exhibited substantial individual-level fluctuations between periods. While the number of participants is limited, these findings highlight the need to account for temporal variability in physiological-arousal relationships and suggest that emotion estimation models should be periodically updated -- e.g., every five months -- based on observed shift trends to maintain robust performance over time.
Authors:Vidya Setlur, Samuel Ridet
Abstract:
Spatial computing presents new opportunities for immersive data storytelling, yet there is limited guidance on how to build such experiences or adapt traditional narrative visualizations to this medium. We introduce a toolkit, RÃCITKIT for supporting spatial data narratives in head-mounted display (HMD) environments. The toolkit allows developers to create interactive dashboards, tag data attributes as spatial assets to 3D models and immersive scenes, generate text and audio narratives, enabling dynamic filtering, and hierarchical drill-down data discoverability. To demonstrate the utility of the toolkit, we developed Charles Minard's historical flow map of Napoleon's 1812 campaign in Russia as an immersive experience on Apple Vision Pro. We conducted a preliminary evaluation with 21 participants that comprised two groups: developers, who evaluated the toolkit by authoring spatial stories and consumers, who provided feedback on the Minard app's narrative clarity, interaction design, and engagement. Feedback highlighted how spatial interactions and guided narration enhanced insight formation, with participants emphasizing the benefits of physical manipulation (e.g., gaze, pinch, navigation) for understanding temporal and geographic data. Participants also identified opportunities for future enhancement, including improved interaction affordance visibility, customizable storytelling logic, and integration of contextual assets to support user orientation. These findings contribute to the broader discourse on toolkit-driven approaches to immersive data storytelling across domains such as education, decision support, and exploratory analytics.
Authors:Aparajita Marathe, Anne Marie Piper
Abstract:
Many technology companies aim to improve access and inclusion not only by making their products accessible but also by bringing people with disabilities into the tech workforce. We know less about how accessibility is experienced and negotiated by disabled workers within these organizations. Through interviews with 20 BLV workers across various tech companies, we uncover a persistent misalignment between organizational attempts at accessibility and the current realities of these employees. We introduce the concept of the accessibility paradox, which we define as the inherent tension between the productivity- and profit-driven nature of tech companies and their desire to hire and retain disabled workers. Focusing on the experiences of BLV workers, we show how the accessibility paradox manifests in their everyday workplace interactions, including digital infrastructure, accommodations processes and policies, ability assumptions, and competing priorities. We offer recommendations for future research and practice to understand and improve workplace accessibility and inclusion.
Authors:Kérian Fiter, Louis Malassigné-Onfroy, Bentley Oakes
Abstract:
With Digital Twin (DT) construction and evolution occurring over time, stakeholders require tools to understand the current characteristics and conceptual architecture of the system at any time. We introduce DTInsight, a systematic and automated tool and methodology for producing continuous reporting for DTs. DTInsight offers three key features: (a) an interactive conceptual architecture visualization of DTs; (b) generation of summaries of DT characteristics based on ontological data; and (c) integration of these outputs into a reporting page within a continuous integration and continuous deployment (CI/CD) pipeline. Given a modeled description of the DT aligning to our DT Description Framework (DTDF), DTInsight enables up-to-date and detailed reports for enhanced stakeholder understanding.
Authors:Aditi Bhalla, Christian Hellert, Enkelejda Kasneci, Nastassja Becker
Abstract:
Understanding and estimating driver trust and comfort are essential for the safety and widespread acceptance of autonomous vehicles. Existing works analyze user trust and comfort separately, with limited real-time assessment and insufficient multimodal data. This paper introduces a novel multimodal dataset called TRUCE-AV, focusing on trust and comfort estimation in autonomous vehicles. The dataset collects real-time trust votes and continuous comfort ratings of 31 participants during a simulator-based fully autonomous driving. Simultaneously, physiological signals, such as heart rate, gaze, and emotions, along with environmental data (e.g., vehicle speed, nearby vehicle positions, and velocity), are recorded throughout the drives. Standard pre- and post-drive questionnaires were also administered to assess participants' trust in automation and overall well-being, enabling the correlation of subjective assessments with real-time responses. To demonstrate the utility of our dataset, we evaluated various machine learning models for trust and comfort estimation using physiological data. Our analysis showed that tree-based models like Random Forest and XGBoost and non-linear models such as KNN and MLP regressor achieved the best performance for trust classification and comfort regression. Additionally, we identified key features that contribute to these estimations by using SHAP analysis on the top-performing models. Our dataset enables the development of adaptive AV systems capable of dynamically responding to user trust and comfort levels non-invasively, ultimately enhancing safety, user experience, and human-centered vehicle design.
Authors:Temesgen Kitaw Damenu, İnci Zaim Gökbay, Alexandra Covaci, Shujun Li
Abstract:
Educational games have been widely used to teach children about cyber security. This systematic literature review reveals evidence of positive learning outcomes, after analysing 91 such games reported in 68 papers published between 2010 and 2024. However, critical gaps have also been identified regarding the design processes and the methodological rigour, including lack of systematic design, misalignment between proposed and achieved learning outcomes, rare use of control groups, limited discussions on ethical considerations, and underutilisation of emerging technologies. We recommend multiple future research directions, e.g., a hybrid approach to game design and evaluation that combines bottom-up and top-down approaches.
Authors:Matthew Termuende, Kevin Larson, Miguel Nacenta
Abstract:
When a reader encounters a word in English, they split the word into smaller orthographic units in the process of recognizing its meaning. For example, "rough", when split according to phonemes, is decomposed as r-ou-gh (not as r-o-ugh or r-ough), where each group of letters corresponds to a sound. Since there are many ways to segment a group of letters, this constitutes a computational operation that has to be solved by the reading brain, many times per minute, in order to achieve the recognition of words in text necessary for reading. We hypothesized that providing segmentation information in the text itself could help the reading process by reducing its computational cost. In this paper we explore whether and how different visual interventions could communicate segmentation information for reading and word recognition. We ran a series of pre-registered lexical decision experiments with 192 participants that tested five types of visual segmentations: outlines, spacing, connections, underlines and color. The evidence indicates that, even with a moderate amount of training, these visual interventions always slow down word identification, but each to a different extent. These findings are important because they indicate that, at least for typical adult readers with a moderate amount of specific training in these visual interventions, accelerating the lexical decision task is unlikely. The results also offer an empirical measurement of the cost of a common set of visual manipulations of text, which can be useful for practitioners seeking to visualize alongside or within text without impacting reading performance. Finally, the interaction between typographically encoded information and visual variables presented unique patterns that deviate from existing theories, suggesting new directions for future inquiry.
Authors:Vikranth Udandarao, Nipun Misra
Abstract:
India's developer community faces significant barriers to sustained experimentation and learning with commercial Large Language Model (LLM) APIs, primarily due to economic and infrastructural constraints. This study empirically evaluates local LLM deployment using Ollama as an alternative to commercial cloud-based services for developer-focused applications. Through a mixed-methods analysis involving 180 Indian developers, students, and AI enthusiasts, we find that local deployment enables substantially greater hands-on development and experimentation, while reducing costs by 33% compared to commercial solutions. Developers using local LLMs completed over twice as many experimental iterations and reported deeper understanding of advanced AI architectures. Our results highlight local deployment as a critical enabler for inclusive and accessible AI development, demonstrating how technological accessibility can enhance learning outcomes and innovation capacity in resource-constrained environments.
Authors:Hongrak Pak, Ali Mostafavi
Abstract:
Disasters frequently exceed established hazard models, revealing blind spots where unforeseen impacts and vulnerabilities hamper effective response. This perspective paper contends that situational awareness (SA)-the ability to perceive, interpret, and project dynamic crisis conditions-is an often overlooked yet vital capability for disaster resilience. While risk mitigation measures can reduce known threats, not all hazards can be neutralized; truly adaptive resilience hinges on whether organizations rapidly detect emerging failures, reconcile diverse data sources, and direct interventions where they matter most. We present a technology-process-people roadmap, demonstrating how real-time hazard nowcasting, interoperable workflows, and empowered teams collectively transform raw data into actionable insight. A system-of-systems approach enables federated data ownership and modular analytics, so multiple agencies can share timely updates without sacrificing their distinct operational models. Equally crucial, structured sense-making routines and cognitive load safeguards help humans remain effective decision-makers amid data abundance. By framing SA as a socio-technical linchpin rather than a peripheral add-on, this paper spotlights the urgency of elevating SA to a core disaster resilience objective. We conclude with recommendations for further research-developing SA metrics, designing trustworthy human-AI collaboration, and strengthening inclusive data governance-to ensure that communities are equipped to cope with both expected and unexpected crises.
Authors:Sanika Moharana, Cynthia L. Bennett, Erin Buehler, Michael Madaio, Vinita Tibdewal, Shaun K. Kane
Abstract:
The rapid emergence of generative AI has changed the way that technology is designed, constructed, maintained, and evaluated. Decisions made when creating AI-powered systems may impact some users disproportionately, such as people with disabilities. In this paper, we report on an interview study with 25 AI practitioners across multiple roles (engineering, research, UX, and responsible AI) about how their work processes and artifacts may impact end users with disabilities. We found that practitioners experienced friction when triaging problems at the intersection of responsible AI and accessibility practices, navigated contradictions between accessibility and responsible AI guidelines, identified gaps in data about users with disabilities, and gathered support for addressing the needs of disabled stakeholders by leveraging informal volunteer and community groups within their company. Based on these findings, we offer suggestions for new resources and process changes to better support people with disabilities as end users of AI.
Authors:Yogesh Kumar Meena, Manish Salvi
Abstract:
Over the past decade, the demand for communication devices has increased among individuals with mobility and speech impairments. Eye-gaze tracking has emerged as a promising solution for hands-free communication; however, traditional appearance-based interfaces often face challenges such as accuracy issues, involuntary eye movements, and difficulties with extensive command sets. This work presents a multimodal appearance-based gaze-controlled virtual keyboard that utilises deep learning in conjunction with standard camera hardware, incorporating both synchronous and asynchronous modes for command selection. The virtual keyboard application supports menu-based selection with nine commands, enabling users to spell and type up to 56 English characters, including uppercase and lowercase letters, punctuation, and a delete function for corrections. The proposed system was evaluated with twenty able-bodied participants who completed specially designed typing tasks using three input modalities: (i) a mouse, (ii) an eye-tracker, and (iii) an unmodified webcam. Typing performance was measured in terms of speed and information transfer rate (ITR) at both command and letter levels. Average typing speeds were 18.3+-5.31 letters/min (mouse), 12.60+-2.99letters/min (eye-tracker, synchronous), 10.94 +- 1.89 letters/min (webcam, synchronous), 11.15 +- 2.90 letters/min (eye-tracker, asynchronous), and 7.86 +- 1.69 letters/min (webcam, asynchronous). ITRs were approximately 80.29 +- 15.72 bits/min (command level) and 63.56 +- 11 bits/min (letter level) with webcam in synchronous mode. The system demonstrated good usability and low workload with webcam input, highlighting its user-centred design and promise as an accessible communication tool in low-resource settings.
Authors:MichaÅ Patryk Miazga, Patrick Ebel
Abstract:
Biomechanical forward simulation holds great potential for HCI, enabling the generation of human-like movements in interactive tasks. However, training biomechanical models with reinforcement learning is challenging, particularly for precise and dexterous movements like those required for touchscreen interactions on mobile devices. Current approaches are limited in their interaction fidelity, require restricting the underlying biomechanical model to reduce complexity, and do not generalize well. In this work, we propose practical improvements to training routines that reduce training time, increase interaction fidelity beyond existing methods, and enable the use of more complex biomechanical models. Using a touchscreen pointing task, we demonstrate that curriculum learning, action masking, more complex network configurations, and simple adjustments to the simulation environment can significantly improve the agent's ability to learn accurate touch behavior. Our work provides HCI researchers with practical tips and training routines for developing better biomechanical models of human-like interaction fidelity.
Authors:Kayenat Fatmi, Mohammad Abbas
Abstract:
In the digital era, individuals are increasingly exposed to online harms such as toxicity, manipulation, and grooming, which often pose emotional and safety risks. Existing systems for detecting abusive content or issuing safety alerts operate in isolation and rarely combine digital safety with emotional well-being. In this paper, we present SafeSpace, a unified web application that integrates three modules: (1) toxicity detection in chats and screenshots using NLP models and Google's Perspective API, (2) a configurable safety ping system that issues emergency alerts with the user's live location (longitude and latitude) via SMTP-based emails when check-ins are missed or SOS alerts are manually triggered, and (3) a reflective questionnaire that evaluates relationship health and emotional resilience. The system employs Firebase for alert management and a modular architecture designed for usability, privacy, and scalability. The experimental evaluation shows 93% precision in toxicity detection, 100% reliability in safety alerts under emulator tests, and 92% alignment between automated and manual questionnaire scoring. SafeSpace, implemented as a web application, demonstrates the feasibility of integrating detection, protection, and reflection within a single platform, with future deployment envisioned as a mobile application for broader accessibility.
Authors:Ryogo Niwa, Shigeo Yoshida, Yuki Koyama, Yoshitaka Ushiku
Abstract:
Designing successful interactions requires identifying optimal design parameters. To do so, designers often conduct iterative user testing and exploratory trial-and-error. This involves balancing multiple objectives in a high-dimensional space, making the process time-consuming and cognitively demanding. System-led optimization methods, such as those based on Bayesian optimization, can determine for designers which parameters to test next. However, they offer limited opportunities for designers to intervene in the optimization process, negatively impacting the designer's experience. We propose a design optimization framework that enables natural language interactions between designers and the optimization system, facilitating cooperative design optimization. This is achieved by integrating system-led optimization methods with Large Language Models (LLMs), allowing designers to intervene in the optimization process and better understand the system's reasoning. Experimental results show that our method provides higher user agency than a system-led method and shows promising optimization performance compared to manual design. It also matches the performance of an existing cooperative method with lower cognitive load.
Authors:Ignacio Perez-Messina, Asanobu Kitamoto
Abstract:
We present a visualization system designed to support typographic forensics in the study of Kokatsuji, the short-lived tradition of Japanese movable wooden type printing. Building on recent advances in machine learning for block identification, our system provides expert users with an interactive tool for exploring, validating hypothesis, and integrating expert knowledge into model-generated results about the production process of early printed books. The system is structured around an ontology of four conceptual objects (spreads, segments, blocks, and characters) each corresponding to a dedicated view in the system. These coordinated views enable scholars to navigate between material evidence and computational abstractions, supporting close, near-by, and distant reading practices. Preliminary results from expert use of the system demonstrate its ability to reveal errors in segmentation, inconsistencies in clustering, and previously inaccessible patterns of block reuse.
Authors:Andria Andriuzzi, Géraldine Michel
Abstract:
In social media, marketers attempt to influence consumers by using directive language, that is, expressions designed to get consumers to take action. While the literature has shown that directive messages in advertising have mixed results for recipients, we know little about the effects of directive brand language on consumers who see brands interacting with other consumers in social media conversations. On the basis of a field study and three online experiments, this study shows that directive language in brand conversation has a detrimental downstream effect on engagement of consumers who observe such exchanges. Specifically, in line with Goffman's facework theory, because a brand that encourages consumers to react could be perceived as face-threatening, consumers who see a brand interacting with others in a directive way may feel vicarious embarrassment and engage less (compared with a conversation without directive language). In addition, we find that when the conversation is nonproduct-centered (vs. product-centered), consumers expect more freedom, as in mundane conversations, even for others; therefore, directive language has a stronger negative effect. However, in this context, the strength of the brand relationship mitigates this effect. Thus, this study contributes to the literature on directive language and brand-consumer interactions by highlighting the importance of context in interactive communication, with direct relevance for social media and brand management.
Authors:Seyedali Mohammadi, Manas Paldhe, Amit Chhabra
Abstract:
Phone call transcript labeling is prohibitively expensive (approximately 2 USD per minute) due to privacy regulations, consent requirements, and manual annotation costs requiring 3 hours of expert time per hour of audio. Existing extraction methods fail on conversational speech containing disfluencies, interruptions, and speaker overlap. We introduce LingVarBench, a synthetic data generation pipeline that addresses these constraints through automated validation. First, we prompt an LLM to generate realistic structured field values across multiple use cases. Second, we recursively prompt the model to transform these values into thousands of natural conversational utterances containing typical phone call characteristics. Third, we validate each synthetic utterance by testing whether a separate LLM-based extractor can recover the original structured information. We employ DSPy's SIMBA optimizer to automatically synthesize extraction prompts from validated synthetic transcripts, eliminating manual prompt engineering. Our optimized prompts achieve up to 95 percent accuracy for numeric fields (vs. 88-89 percent zero-shot), 90 percent for names (vs. 47-79 percent), and over 80 percent for dates (vs. 72-77 percent) on real customer transcripts, demonstrating substantial gains over zero-shot prompting. The synthetic-to-real transfer demonstrates that conversational patterns learned from generated data generalize effectively to authentic phone calls containing background noise and domain-specific terminology. LingVarBench provides the first systematic benchmark for structured extraction from synthetic conversational data, demonstrating that automated prompt optimization overcomes cost and privacy barriers preventing large-scale phone call analysis in commercial settings.
Authors:Mark Cote, Susana Aires
Abstract:
This paper argues that a techno-philosophical reading of the EU AI Act provides insight into the long-term dynamics of data in AI systems, specifically, how the lifecycle from ingestion to deployment generates recursive value chains that challenge existing frameworks for Responsible AI. We introduce a conceptual tool to frame the AI pipeline, spanning data, training regimes, architectures, feature stores, and transfer learning. Using cross-disciplinary methods, we develop a technically grounded and philosophically coherent analysis of regulatory blind spots. Our central claim is that what remains absent from policymaking is an account of the dynamic of becoming that underpins both the technical operation and economic logic of AI. To address this, we advance a formal reading of AI inspired by Simondonian philosophy of technology, reworking his concept of individuation to model the AI lifecycle, including the pre-individual milieu, individuation, and individuated AI. To translate these ideas, we introduce futurity: the self-reinforcing lifecycle of AI, where more data enhances performance, deepens personalisation, and expands application domains. Futurity highlights the recursively generative, non-rivalrous nature of data, underpinned by infrastructures like feature stores that enable feedback, adaptation, and temporal recursion. Our intervention foregrounds escalating power asymmetries, particularly the tech oligarchy whose infrastructures of capture, training, and deployment concentrate value and decision-making. We argue that effective regulation must address these infrastructural and temporal dynamics, and propose measures including lifecycle audits, temporal traceability, feedback accountability, recursion transparency, and a right to contest recursive reuse.
Authors:Guillermo Vera-Amaro, José Rafael Rojano-Cáceres
Abstract:
This paper addresses the limited attention given to blind users as content creators in Content Management Systems (CMS), a gap that remains under-explored in web accessibility research. For blind authors, effective interaction with CMS platforms requires more than technical compliance; it demands interfaces designed with semantic clarity, predictable navigation, and meaningful feedback for screen reader users. This study investigates the accessibility barriers blind users face when performing key tasks, such as page creation, menu editing, and image publishing, using CMS platforms. A two-fold evaluation was conducted using automated tools and manual usability testing with three blind and one sighted participant, complemented by expert analysis based on the Barrier Walkthrough method. Results showed that block-based interfaces were particularly challenging, often marked as accessible by automated tools but resulting in critical usability issues during manual evaluation. The use of a text-based editor, the integration of AI-generated image descriptions, and training aligned with screen reader workflows, significantly improved usability and autonomy. These findings underscore the limitations of automated assessments and highlight the importance of user-centered design practices. Enhancing CMS accessibility requires consistent navigation structures, reduced reliance on visual interaction patterns, and the integration of AI tools that support blind content authors throughout the content creation process.
Authors:Basile Pesin, Celia Picard, Cyril Allignol
Abstract:
User Interface Description Languages (UIDLs) are high-level languages that facilitate the development of Human-Machine Interfaces, such as Graphical User Interface (GUI) applications. They usually provide first-class primitives to specify how the program reacts to an external event (user input, network message), and how data flows through the program. Although these domain-specific languages are now widely used to implement safety-critical GUIs, little work has been invested in their formalization and verification.
In this paper, we propose a denotational semantic model for a core reactive UIDL, Smalite, which we argue is expressive enough to encode constructs from more realistic languages. This preliminary work may be used as a stepping stone to produce a formally verified compiler for UIDLs.
Authors:Samra Zafar, Shifa Yousaf, Muhammad Shaheer Minhas
Abstract:
As large language models (LLMs) increasingly assist in evaluating student writing, researchers have begun questioning whether these models can be cognitively grounded, that is, whether they can attend not just to the final product, but to the process by which it was written. In this study, we explore how incorporating writing process data, specifically keylogs and time-stamped snapshots, affects the quality of LLM-generated feedback. We conduct an ablation study on 52 student essays comparing feedback generated with access to only the final essay (C1) and feedback that also incorporates keylogs and time-stamped snapshots (C2). While rubric scores changed minimally, C2 feedback demonstrated significantly improved structural evaluation and greater process-sensitive justification.
Authors:Riya Sinha, Amelia Acker, Hanlin Li
Abstract:
Wikidata, an open structured database and a sibling project to Wikipedia, has recently become an important platform for information professionals to share structured metadata from their memory institutions, organizations that maintain public knowledge and cultural heritage materials. While studies have investigated why and how peer producers contribute to Wikidata, the institutional motivations and practices of these organizations are less understood. Given Wikidata's potential role in linking and supporting knowledge infrastructures and open data systems, we examined why and how information professionals in memory institutions use Wikidata as part of their organizational workflow. Through interviews with 15 participants, we identified the three archetypal roles of Wikidata users within memory institutions, providers, acquirers, and mutualists, and the different types of contributions that these institutions bring to Wikidata. We then explored potential collaboration opportunities between memory institutions and other volunteers in Wikidata, discussed the value of the data work conducted by these professionals, and examined how and why they track their contributions. Our work contributes to the wider discussions around collaboration and data work in CSCW by (1) studying the motivations and practices of information professionals, their differences from those doing volunteer work, and opportunities for the Wikidata community to promote more collaborative efforts within memory institutions and with other volunteers and (2) drawing attention to the important data work done by memory institutions on Wikidata and pointing out opportunities to support the contributions of information professionals.
Authors:Rong Pan, Hongyue Sun, Xiaoyu Chen, Giulia Pedrielli, Jiapeng Huang
Abstract:
Human digital twins (HDTs) are dynamic, data-driven virtual representations of individuals, continuously updated with multimodal data to simulate, monitor, and predict health trajectories. By integrating clinical, physiological, behavioral, and environmental inputs, HDTs enable personalized diagnostics, treatment planning, and anomaly detection. This paper reviews current approaches to HDT modeling, with a focus on statistical and machine learning techniques, including recent advances in anomaly detection and failure prediction. It also discusses data integration, computational methods, and ethical, technological, and regulatory challenges in deploying HDTs for precision healthcare.
Authors:Black Sun, Ge Kacy Fu, Shichao Guo
Abstract:
Traditional approaches to teaching moral dilemmas often rely on abstract, disembodied scenarios that limit emotional engagement and reflective depth. To address this gap, we developed \textit{Ashes or Breath}, a Mixed Reality game delivered via head-mounted displays(MR-HMDs). This places players in an ethical crisis: they must save a living cat or a priceless cultural artifact during a museum fire. Designed through an iterative, values-centered process, the experience leverages embodied interaction and spatial immersion to heighten emotional stakes and provoke ethical reflection. Players face irreversible, emotionally charged choices followed by narrative consequences in a reflective room, exploring diverse perspectives and societal implications. Preliminary evaluations suggest that embedding moral dilemmas into everyday environments via MR-HMDs intensifies empathy, deepens introspection, and encourages users to reconsider their moral assumptions. This work contributes to ethics-based experiential learning in HCI, positioning augmented reality not merely as a medium of interaction but as a stage for ethical encounter.
Authors:Ann-Sophie L. Schenk, Stefan Schiffer, Heqiu Song
Abstract:
In this paper we report on first insights from interviews with teachers and students on using social robots in computer science class in sixth grade. Our focus is on learning about requirements and potential applications. We are particularly interested in getting both perspectives, the teachers' and the learners' view on how robots could be used and what features they should or should not have. Results show that teachers as well as students are very open to robots in the classroom. However, requirements are partially quite heterogeneous among the groups. This leads to complex design challenges which we discuss at the end of this paper.
Authors:Jaeung Lee, Suhyeon Yu, Yurim Jang, Simon S. Woo, Jaemin Jo
Abstract:
Machine Unlearning (MU) aims to remove target training data from a trained model so that the removed data no longer influences the model's behavior, fulfilling "right to be forgotten" obligations under data privacy laws. Yet, we observe that researchers in this rapidly emerging field face challenges in analyzing and understanding the behavior of different MU methods, especially in terms of three fundamental principles in MU: accuracy, efficiency, and privacy. Consequently, they often rely on aggregate metrics and ad-hoc evaluations, making it difficult to accurately assess the trade-offs between methods. To fill this gap, we introduce a visual analytics system, Unlearning Comparator, designed to facilitate the systematic evaluation of MU methods. Our system supports two important tasks in the evaluation process: model comparison and attack simulation. First, it allows the user to compare the behaviors of two models, such as a model generated by a certain method and a retrained baseline, at class-, instance-, and layer-levels to better understand the changes made after unlearning. Second, our system simulates membership inference attacks (MIAs) to evaluate the privacy of a method, where an attacker attempts to determine whether specific data samples were part of the original training set. We evaluate our system through a case study visually analyzing prominent MU methods and demonstrate that it helps the user not only understand model behaviors but also gain insights that can inform the improvement of MU methods.
Authors:Bixuan Ren, EunJeong Cheon, Jianghui Li
Abstract:
The rapid integration of generative artificial intelligence (GenAI) across diverse fields underscores the critical need for red teaming efforts to proactively identify and mitigate associated risks. While previous research primarily addresses technical aspects, this paper highlights organizational factors that hinder the effectiveness of red teaming in real-world settings. Through qualitative analysis of 17 semi-structured interviews with red teamers from various organizations, we uncover challenges such as the marginalization of vulnerable red teamers, the invisibility of nuanced AI risks to vulnerable users until post-deployment, and a lack of user-centered red teaming approaches. These issues often arise from underlying organizational dynamics, including organizational resistance, organizational inertia, and organizational mediocracy. To mitigate these dynamics, we discuss the implications of user research for red teaming and the importance of embedding red teaming throughout the entire development cycle of GenAI systems.
Authors:Vuong Nguyen, Gabriel Vigliensoni
Abstract:
We introduce fCrit, a dialogue-based AI system designed to critique furniture design with a focus on explainability. Grounded in reflective learning and formal analysis, fCrit employs a multi-agent architecture informed by a structured design knowledge base. We argue that explainability in the arts should not only make AI reasoning transparent but also adapt to the ways users think and talk about their designs. We demonstrate how fCrit supports this process by tailoring explanations to users' design language and cognitive framing. This work contributes to Human-Centered Explainable AI (HCXAI) in creative practice, advancing domain-specific methods for situated, dialogic, and visually grounded AI support.
Authors:Alessandro Silacci, Maurizio Caon, Mauro Cherubini
Abstract:
Virtual agents are commonly used in physical activity interventions to support behavior change, often taking the role of coaches that deliver encouragement and feedback. While effective for compliance, this role typically lacks relational depth. This pilot study explores how such agents might be perceived not just as instructors, but as co-participants: entities that appear to exert effort alongside users. Drawing on thematic analysis of semi-structured interviews with 12 participants from a prior physical activity intervention, we examine how users interpret and evaluate agent effort in social comparison contexts. Our findings reveal a recurring tension between perceived performance and authenticity. Participants valued social features when they believed others were genuinely trying. In contrast, ambiguous or implausible activity levels undermined trust and motivation. Many participants expressed skepticism toward virtual agents unless their actions reflected visible effort or were grounded in relatable human benchmarks. Based on these insights, we propose early design directions for fostering co-experienced exertion in agents, including behavioral cues, narrative grounding, and personalized performance. These insights contribute to the design of more engaging, socially resonant agents capable of supporting co-experienced physical activity.
Authors:Frank Elavsky, Cindy Xiong Bearfield
Abstract:
This paper is a collaborative piece between two worlds of expertise in the field of data visualization: accessibility and bias. In particular, the rise of generative models playing a role in accessibility is a worrying trend for data visualization. These models are increasingly used to help author visualizations as well as generate descriptions of existing visualizations for people who are blind, low vision, or use assistive technologies such as screen readers. Sighted human-to-human bias has already been established as an area of concern for theory, research, and design in data visualization. But what happens when someone is unable to verify the model output or adequately interrogate algorithmic bias, such as a context where a blind person asks a model to describe a chart for them? In such scenarios, trust from the user is not earned, rather reliance is compelled by the model-to-human relationship. In this work, we explored the dangers of AI-generated descriptions for accessibility, playing a game of telephone between models, observing bias production in model interpretation, and re-interpretation of a data visualization. We unpack ways that model failure in visualization is especially problematic for users with visual impairments, and suggest directions forward for three distinct readers of this piece: technologists who build model-assisted interfaces for end users, users with disabilities leveraging models for their own purposes, and researchers concerned with bias, accessibility, or visualization.
Authors:Shaul Ashkenazi, Gabriel Skantze, Jane Stuart-Smith, Mary Ellen Foster
Abstract:
Social robots are increasingly being deployed in public spaces, where they face not only technological difficulties and unexpected user utterances, but also objections from stakeholders who may not be comfortable with introducing a robot into those spaces. We describe our difficulties with deploying a social robot in two different public settings: 1) Student services center; 2) Refugees and asylum seekers drop-in service. Although this is a failure report, in each use case we eventually managed to earn the trust of the staff and form a relationship with them, allowing us to deploy our robot and conduct our studies.
Authors:Yousra Shleibik, Jordan Sinclair, Kerstin Haring
Abstract:
The advent of autonomous driving systems promises to transform transportation by enhancing safety, efficiency, and comfort. As these technologies evolve toward higher levels of autonomy, the need for integrated systems that seamlessly support human involvement in decision-making becomes increasingly critical. Certain scenarios necessitate human involvement, including those where the vehicle is unable to identify an object or element in the scene, and as such cannot take independent action. Therefore, situational awareness is essential to mitigate potential risks during a takeover, where a driver must assume control and autonomy from the vehicle. The need for driver attention is important to avoid collisions with external agents and ensure a smooth transition during takeover operations. This paper explores the integration of attention redirection techniques, such as gaze manipulation through targeted visual and auditory cues, to help drivers maintain focus on emerging hazards and reduce target fixation in semi-autonomous driving scenarios. We propose a conceptual framework that combines real-time gaze tracking, context-aware saliency analysis, and synchronized visual and auditory alerts to enhance situational awareness, proactively address potential hazards, and foster effective collaboration between humans and autonomous systems.
Authors:Suman Saha, Fatemeh Rahbari, Farhan Sadique, Sri Krishna Chaitanya Velamakanni, Mahfuza Farooque, William J. Rothwell
Abstract:
This paper explores integrating microlearning strategies into university curricula, particularly in computer science education, to counteract the decline in class attendance and engagement in US universities after COVID. As students increasingly opt for remote learning and recorded lectures, traditional educational approaches struggle to maintain engagement and effectiveness. Microlearning, which breaks complex subjects into manageable units, is proposed to address shorter attention spans and enhance educational outcomes. It uses interactive formats such as videos, quizzes, flashcards, and scenario-based exercises, which are especially beneficial for topics like algorithms and programming logic requiring deep understanding and ongoing practice. Adoption of microlearning is often limited by the effort needed to create such materials. This paper proposes leveraging AI tools, specifically ChatGPT, to reduce the workload for educators by automating the creation of supplementary materials. While AI can automate certain tasks, educators remain essential in guiding and shaping the learning process. This AI-enhanced approach ensures course content is kept current with the latest research and technology, with educators providing context and insights. By examining AI capabilities in microlearning, this study shows the potential to transform educational practices and outcomes in computer science, offering a practical model for combining advanced technology with established teaching methods.
Authors:Viktor von Wyl, Jürgen Bernard
Abstract:
Collaboration between health science and visual analytics research is often hindered by different, sometimes incompatible approaches to research design. Health science often follows hypothesis-driven protocols, registered in advance, and focuses on reproducibility and risk mitigation. Visual analytics, in contrast, relies on iterative data exploration, prioritizing insight generation and analytic refinement through user interaction. These differences create challenges in interdisciplinary projects, including misaligned terminology, unrealistic expectations about data readiness, divergent validation norms, or conflicting explainability requirements. To address these persistent tensions, we identify seven research needs and actions: (1) guidelines for broader community adoption, (2) agreement on quality and validation benchmarks, (3) frameworks for aligning research tasks, (4) integrated workflows combining confirmatory and exploratory stages, (5) tools for harmonizing terminology across disciplines, (6) dedicated bridging roles for transdisciplinary work, and (7) cultural adaptation and mutual recognition. We organize these needs in a framework with three areas: culture, standards, and processes. They can constitute a research agenda for developing reliable, reproducible, and clinically relevant data-centric methods.
Authors:Jens Grubert, Yvonne Sedelmaier, Dieter Landes
Abstract:
Oral examinations are a prevalent but psychologically demanding form of assessment in higher education. Many students experience intense anxiety, which can impair cognitive performance and hinder academic success. This position paper explores the potential of embodied conversational agents (ECAs) in extended reality (XR) environments to support students preparing for oral exams. We propose a system concept that integrates photorealistic ECAs with real-time capable large language models (LLMs) to enable psychologically safe, adaptive, and repeatable rehearsal of oral examination scenarios. We also discuss the potential benefits and challenges of such an envisioned system.
Authors:Tzu-Hui Wu, Sebastian Cmentowski, Yunyin Lou, Jun Hu, Regina Bernhaupt
Abstract:
Group mood plays a crucial role in shaping workspace experiences, influencing group dynamics, team performance, and creativity. The perceived group mood depends on many, often subconscious, aspects such as individual emotional states or group life, which make it challenging to maintain a positive atmosphere. Intelligent technology could support mood regulation in physical office environments, for example, as adaptive ambient lighting for mood regulation. However, little is known about the relationship between the physical workspace and group mood dynamics. To address this knowledge gap, we conducted a qualitative user study (N=8 workgroups and overall 26 participants) to explore how the physical workspace shapes group mood experiences and investigate employees' perspectives on intelligent mood-aware technologies. Our findings reveal key factors influencing group mood, and participants' expectations for supportive technology to preserve privacy and autonomy. Our work highlights the potential of adaptive and responsive workspaces while also emphasizing the need for human-centered, technology-driven interventions that benefit group well-being.
Authors:Bhavishya Tarun, Haoze Du, Dinesh Kannan, Edward F. Gehringer
Abstract:
A Human-in-the-Loop (HITL) approach leverages generative AI to enhance personalized learning by directly integrating student feedback into AI-generated solutions. Students critique and modify AI responses using predefined feedback tags, fostering deeper engagement and understanding. This empowers students to actively shape their learning, with AI serving as an adaptive partner. The system uses a tagging technique and prompt engineering to personalize content, informing a Retrieval-Augmented Generation (RAG) system to retrieve relevant educational material and adjust explanations in real time. This builds on existing research in adaptive learning, demonstrating how student-driven feedback loops can modify AI-generated responses for improved student retention and engagement, particularly in STEM education. Preliminary findings from a study with STEM students indicate improved learning outcomes and confidence compared to traditional AI tools. This work highlights AI's potential to create dynamic, feedback-driven, and personalized learning environments through iterative refinement.
Authors:Evey Jiaxin Huang, Matthew Easterday, Elizabeth Gerber
Abstract:
Entrepreneurship requires navigating open-ended, ill-defined problems: identifying risks, challenging assumptions, and making strategic decisions under deep uncertainty. Novice founders often struggle with these metacognitive demands, while mentors face limited time and visibility to provide tailored support. We present a human-AI coaching system that combines a domain-specific cognitive model of entrepreneurial risk with a large language model (LLM) to proactively scaffold both novice and mentor thinking. The system proactively poses diagnostic questions that challenge novices' thinking and helps both novices and mentors plan for more focused and emotionally attuned meetings. Critically, mentors can inspect and modify the underlying cognitive model, shaping the logic of the system to reflect their evolving needs. Through an exploratory field deployment, we found that using the system supported novice metacognition, helped mentors plan emotionally attuned strategies, and improved meeting depth, intentionality, and focus--while also surfaced key tensions around trust, misdiagnosis, and expectations of AI. We contribute design principles for proactive AI systems that scaffold metacognition and human-human collaboration in complex, ill-defined domains, offering implications for similar domains like healthcare, education, and knowledge work.
Authors:Zikai Wen, Lanjing Liu, Yaxing Yao
Abstract:
As families face increasingly complex safety challenges in digital and physical environments, generative AI (GenAI) presents new opportunities to support household safety through multiple specialized AI agents. Through a two-phase qualitative study consisting of individual interviews and collaborative sessions with 13 parent-child dyads, we explored families' conceptualizations of GenAI and their envisioned use of AI agents in daily family life. Our findings reveal that families preferred to distribute safety-related support across multiple AI agents, each embodying a familiar caregiving role: a household manager coordinating routine tasks and mitigating risks such as digital fraud and home accidents; a private tutor providing personalized educational support, including safety education; and a family therapist offering emotional support to address sensitive safety issues such as cyberbullying and digital harassment. Families emphasized the need for agent-specific privacy boundaries, recognized generational differences in trust toward AI agents, and stressed the importance of maintaining open family communication alongside the assistance of AI agents. Based on these findings, we propose a multi-agent system design featuring four privacy-preserving principles: memory segregation, conversational consent, selective data sharing, and progressive memory management to help balance safety, privacy, and autonomy within family contexts.
Authors:Lauren W. Wang, Parastoo Abtahi
Abstract:
Robots are increasingly capable of autonomous operations, yet human interaction remains essential for issuing personalized instructions. Instead of directly controlling robots through Programming by Demonstration (PbD) or teleoperation, we propose giving instructions by interacting with GhostObjects-world-aligned, life-size virtual twins of physical objects-in augmented reality (AR). By direct manipulation of GhostObjects, users can precisely specify physical goals and spatial parameters, with features including real-world lasso selection of multiple objects and snapping back to default positions, enabling tasks beyond simple pick-and-place.
Authors:Luis Vitor Zerkowski, Nina S. T. Hirata
Abstract:
Indigenous communities face ongoing challenges in preserving their cultural heritage, particularly in the face of systemic marginalization and urban development. In Brazil, the Museu Nacional dos Povos Indigenas through the Tainacan platform hosts the country's largest online collection of Indigenous objects and iconographies, providing a critical resource for cultural engagement. Using publicly available data from this repository, we present a data-driven initiative that applies artificial intelligence to enhance accessibility, interpretation, and exploration. We develop two semantic pipelines: a visual pipeline that models image-based similarity and a textual pipeline that captures semantic relationships from item descriptions. These embedding spaces are projected into two dimensions and integrated into an interactive visualization tool we also developed. In addition to similarity-based navigation, users can explore the collection through temporal and geographic lenses, enabling both semantic and contextualized perspectives. The system supports curatorial tasks, aids public engagement, and reveals latent connections within the collection. This work demonstrates how AI can ethically contribute to cultural preservation practices.
Authors:Arlindo Gomes, Emilly Brito, Luis Morais, Nivan Ferreira
Abstract:
Maps are essential to news media as they provide a familiar way to convey spatial context and present engaging narratives. However, the design of journalistic maps may be challenging, as editorial teams need to balance multiple aspects, such as aesthetics, the audience's expected data literacy, tight publication deadlines, and the team's technical skills. Data journalists often come from multiple areas and lack a cartography, data visualization, and data science background, limiting their competence in creating maps. While previous studies have examined spatial visualizations in data stories, this research seeks to gain a deeper understanding of the map design process employed by news outlets. To achieve this, we strive to answer two specific research questions: what is the design space of journalistic maps? and how do editorial teams produce journalistic map articles? To answer the first one, we collected and analyzed a large corpus of 462 journalistic maps used in news articles from five major news outlets published over three months. As a result, we created a design space comprised of eight dimensions that involved both properties describing the articles' aspects and the visual/interactive features of maps. We approach the second research question via semi-structured interviews with four data journalists who create data-driven articles daily. Through these interviews, we identified the most common design rationales made by editorial teams and potential gaps in current practices. We also collected the practitioners' feedback on our design space to externally validate it. With these results, we aim to provide researchers and journalists with empirical data to design and study journalistic maps.
Authors:Zhanming Chen, Juan F. Maestre, May Hang, Alisha Ghaju, Ji Youn Shin
Abstract:
Individuals resettled in a new environment often face challenges in accessing adequate healthcare services, particularly within the complex processes of outpatient clinic care. Cultural differences, language barriers, and low socioeconomic status contribute to these difficulties. While previous studies have identified barriers and proposed technology-mediated solutions for resettled populations, many focus on addressing deficits rather than building on the strengths these communities already possess, which limits the sustainability and relevance of these solutions in everyday life. We conducted two community-based participatory design workshops with 30 Hmong community members in a large metropolitan area in the US. Through this process, we identified four types of assets the community has gradually developed, including intergenerational support for health management and storytelling-based communication practices that facilitate relatable and culturally grounded interactions. We show how participatory design workshops can foster asset-based approaches, and discuss design implications for technologies that leverage patients' existing strengths to support their health management during outpatient visits.
Authors:Birgit Nierula, Mustafa Tevfik Lafci, Anna Melnik, Mert Akgül, Farelle Toumaleu Siewe, Sebastian Bosse
Abstract:
Proxemics, the study of spatial behavior, is fundamental to social interaction and increasingly relevant for virtual reality (VR) applications. While previous research has established that users respond to personal space violations in VR similarly as in real-world settings, phase-specific physiological responses and the modulating effects of facial expressions remain understudied. We investigated physiological and subjective responses to personal space violations by virtual avatars, to understand how threatening facial expressions and interaction phases (approach vs. standing) influence these responses. Sixteen participants experienced a 2x2 factorial design manipulating Personal Space (intrusion vs. respect) and Facial Expression (neutral vs. angry) while we recorded skin conductance response (SCR), heart rate variability (HRV), and discomfort ratings. Personal space boundaries were individually calibrated using a stop-distance procedure. Results show that SCR responses are significantly higher during the standing phase compared to the approach phase when personal space was violated, indicating that prolonged proximity within personal space boundaries is more physiologically arousing than the approach itself. Angry facial expressions significantly reduced HRV, reflecting decreased parasympathetic activity, and increased discomfort ratings, but did not amplify SCR responses. These findings demonstrate that different physiological modalities capture distinct aspects of proxemic responses: SCR primarily reflects spatial boundary violations, while HRV responds to facial threat cues. Our results provide insights for developing comprehensive multi-modal assessments of social behavior in virtual environments and inform the design of more realistic avatar interactions.
Authors:Xueer Lin, Chenyu Li, Yuhan Lyu, Zhicong Lu, Zhenhui Peng
Abstract:
More and more people, especially females, create and view beauty videos covering topics like makeup tutorials and vlogs on social media platforms. Understanding the communication strategies that creators use in these videos and how they affect viewers' engagement can help spread beauty knowledge. By coding 352 beauty videos in Rednote, this study presents a comprehensive taxonomy of communication strategies used by the creators, such as using home as the video background and displaying makeup effects when starting the narrative at the beginning. We further label and computationally classify six categories of comments that reveal viewers' engagement with beauty videos. The regression analyses reveal the effects of beauty video communication strategies on viewers' engagement; for example, calling viewers to take action at the end tends to attract more comments that debate the product's efficacy. We discuss insights into fostering the creation of beauty videos and the communication of beauty knowledge.
Authors:G. Kalyan Ramana, Sumit Yempalle, Prasad S. Onkar
Abstract:
Conceptual design is a cognitively complex task, especially in the engineering design of products having relative motion between components. Designers prefer sketching as a medium for conceptual design and use gestures and annotations to represent such relative motion. Literature suggests that static representations of motion in sketches may not achieve the intended functionality when realised, because it primarily depends on the designers' mental capabilities for motion simulation. Thus, it is important to understand the cognitive phenomena when designers are exploring concepts of articulated products. The current work is an attempt to understand design neurocognition by categorising the tasks and measuring the mental effort involved in these tasks using EEG. The analysis is intended to validate design intervention tools to support the conceptual design involving motion exploration. A novel EEG-based metric, inter-Band Relative Power Difference (inter-BRPD), is introduced to quantify mental effort. A design experiment is conducted with 32 participants, where they have to perform one control task and 2 focus tasks corresponding to the motion exploration task (MET) and the concept generation task (CGT), respectively. EEG data is recorded during the 3 tasks, cleaned, processed and analysed using the MNE library in Python. It is observed from the results that inter-BRPD captures the essence of mental effort with half the number of conventionally used parameters. The reliability and efficacy of the inter-BRPD metric are also statistically validated against literature-based cognitive metrics. With these new insights, the study opens up possibilities for creating support for conceptual design and its evaluation.
Authors:Jay L. Cunningham, Kevin Zhongyang Shao, Rock Yuren Pang, Nathaniel Mengist
Abstract:
While research has focused on surfacing and auditing algorithmic bias to ensure equitable AI development, less is known about how NLP practitioners - those directly involved in dataset development, annotation, and deployment - perceive and navigate issues of NLP data equity. This study is among the first to center practitioners' perspectives, linking their experiences to a multi-scalar AI governance framework and advancing participatory recommendations that bridge technical, policy, and community domains. Drawing on a 2024 questionnaire and focus group, we examine how U.S.-based NLP data practitioners conceptualize fairness, contend with organizational and systemic constraints, and engage emerging governance efforts such as the U.S. AI Bill of Rights. Findings reveal persistent tensions between commercial objectives and equity commitments, alongside calls for more participatory and accountable data workflows. We critically engage debates on data diversity and diversity washing, arguing that improving NLP equity requires structural governance reforms that support practitioner agency and community consent.
Authors:Malik Khadar, Daniel Runningen, Julia Tang, Stevie Chancellor, Harmanpreet Kaur
Abstract:
Data annotation underpins the success of modern AI, but the aggregation of crowd-collected datasets can harm the preservation of diverse perspectives in data. Difficult and ambiguous tasks cannot easily be collapsed into unitary labels. Prior work has shown that deliberation and discussion improve data quality and preserve diverse perspectives -- however, synchronous deliberation through crowdsourcing platforms is time-intensive and costly. In this work, we create a Socratic dialog system using Large Language Models (LLMs) to act as a deliberation partner in place of other crowdworkers. Against a benchmark of synchronous deliberation on two tasks (Sarcasm and Relation detection), our Socratic LLM encouraged participants to consider alternate annotation perspectives, update their labels as needed (with higher confidence), and resulted in higher annotation accuracy (for the Relation task where ground truth is available). Qualitative findings show that our agent's Socratic approach was effective at encouraging reasoned arguments from our participants, and that the intervention was well-received. Our methodology lays the groundwork for building scalable systems that preserve individual perspectives in generating more representative datasets.
Authors:Xi Long, Christy Boscardin, Lauren A. Maggio, Joseph A. Costello, Ralph Gonzales, Rasmyah Hammoudeh, Ki Lai, Yoon Soo Park, Brian C. Gin
Abstract:
Knowledge syntheses (literature reviews) are essential to health professions education (HPE), consolidating findings to advance theory and practice. However, they are labor-intensive, especially during data extraction. Artificial Intelligence (AI)-assisted extraction promises efficiency but raises concerns about accuracy, making it critical to distinguish AI 'hallucinations' (fabricated content) from legitimate interpretive differences. We developed an extraction platform using large language models (LLMs) to automate data extraction and compared AI to human responses across 187 publications and 17 extraction questions from a published scoping review. AI-human, human-human, and AI-AI consistencies were measured using interrater reliability (categorical) and thematic similarity ratings (open-ended). Errors were identified by comparing extracted responses to source publications. AI was highly consistent with humans for concrete, explicitly stated questions (e.g., title, aims) and lower for questions requiring subjective interpretation or absent in text (e.g., Kirkpatrick's outcomes, study rationale). Human-human consistency was not higher than AI-human and showed the same question-dependent variability. Discordant AI-human responses (769/3179 = 24.2%) were mostly due to interpretive differences (18.3%); AI inaccuracies were rare (1.51%), while humans were nearly three times more likely to state inaccuracies (4.37%). Findings suggest AI variability depends more on interpretability than hallucination. Repeating AI extraction can identify interpretive complexity or ambiguity, refining processes before human review. AI can be a transparent, trustworthy partner in knowledge synthesis, though caution is needed to preserve critical human insights.
Authors:EunJeong Cheon, Ingrid Erickson
Abstract:
The introduction of algorithms into a large number of industries has already restructured the landscape of work and threatens to continue. While a growing body of CSCW research centered on the future of work has begun to document these shifts, relatively little is known about workers' experiences beyond those of platform-mediated gig workers. In this paper, we turn to a traditional work sector, Amazon fulfillment centers (FC), to deepen our field's empirical examination of algorithmic management. Drawing on two years of ethnographic research, we show how FC workers react to managers' interventions, imposed productivity rates, and quantified objectification when subjected to labor-tracking systems in their physical work environments. Situating FC workers' resistance to algorithmic systems and metrics within the current CSCW literature allows us to explicate and link the nuanced practices of FC workers to the larger discourse of algorithmic control mechanisms. In addition, we show how FC workers' resistance practices are emblematic of 'work games'--a long-studied means by which workers agentically configure ("trick") their engagement within work systems. We argue that gaining a more nuanced understanding of workers' resistance and consent in relation to algorithmic management expands our ability to critique and potentially disassemble the economic and political forces at the root of these sociotechnical labor systems.
Authors:Sam H. Ross, Yunseo Lee, Coco K. Lee, Jayne Everson, R. Benjamin Shapiro
Abstract:
Multimodal UI design and development tools that interpret sketches or natural language descriptions of UIs inherently have notations: the inputs they can understand. In AI-based systems, notations are implicitly defined by the data used to train these systems. In order to create usable and intuitive notations for interactive design systems, we must regard, design, and evaluate these training datasets as notation specifications. To better understand the design space of notational possibilities for future design tools, we use the Cognitive Dimensions of Notations framework to analyze two possible notations for UI sketching. The first notation is the sketching rules for an existing UI sketch dataset, and the second notation is the set of sketches generated by participants in this study, where individuals sketched UIs without imposed representational rules. We imagine two systems, FixedSketch and FlexiSketch, built with each notation respectively, in order to understand the differential affordances of, and potential design requirements for, systems. We find that participants' sketches were composed of element-level notations that are ambiguous in isolation but are interpretable in context within whole designs. For many cognitive dimensions, the FlexiSketch notation supports greater intuitive creative expression and affords lower cognitive effort than the FixedSketch notation, but cannot be supported with prevailing, element-based approaches to UI sketch recognition. We argue that for future multimodal design tools to be truly human-centered, they must adopt contemporary AI methods, including transformer-based and human-in-the-loop, reinforcement learning techniques to understand users' context-rich expressive notations and corrections.
Authors:Zahra Hassanzadeh, David Haag, Lydia Chilton, Jan Smeddinck, Norman Farb, Joseph Jay Williams
Abstract:
One-minute behavior change interventions might seem too brief to matter. Could something so short really help people build healthier routines? This work explores this question through two studies examining how ultra-brief prompts might encourage meaningful actions in daily life. In a formative study, we explored how participants engaged with one-minute prompts across four domains: physical activity, eating, screen use, and mental well-being. This revealed two common design approaches: Immediate Action prompts (simple, directive tasks) and Reflection-First prompts (self-awareness before action). We then conducted a 14-day, within-subjects study comparing these two flows with 28 participants. Surprisingly, most participants did not notice differences in structure -- but responded positively when prompts felt timely, relevant, or emotionally supportive. Engagement was not shaped by flow type, but by content fit, tone, and momentary readiness. Participants also co-designed messages, favoring those with step-by-step guidance, personal meaning, or sensory detail. These results suggest that one-minute interventions, while easily dismissed, may serve as meaningful gateways into healthier routines -- if designed to feel helpful in the moment.
Authors:Gaojie Zhou, Junhua Li
Abstract:
Classification models used in brain-computer interface (BCI) are usually designed for a single BCI paradigm. This requires the redevelopment of the model when applying it to a new BCI paradigm, resulting in repeated costs and effort. Moreover, less complex deep learning models are desired for practical usage, as well as for deployment on portable devices. In or-der to fill the above gaps, we, in this study, proposed a light-weight and unified decoding model for cross-BCI-paradigm classification. The proposed model starts with a tempo-spatial convolution. It is followed by a multi-scale local feature selec-tion module, aiming to extract local features shared across BCI paradigms and generate weighted features. Finally, a mul-ti-dimensional global feature extraction module is designed, in which multi-dimensional global features are extracted from the weighted features and fused with the weighted features to form high-level feature representations associated with BCI para-digms. The results, evaluated on a mixture of three classical BCI paradigms (i.e., MI, SSVEP, and P300), demon-strate that the proposed model achieves 88.39%, 82.36%, 80.01%, and 0.8092 for accuracy, macro-precision, mac-ro-recall, and macro-F1-score, respectively, significantly out-performing the compared models. This study pro-vides a feasible solution for cross-BCI-paradigm classifica-tion. It lays a technological foundation for de-veloping a new generation of unified decoding systems, paving the way for low-cost and universal practical applications.
Authors:Wei Guo, Shunsei Yamagishi, Lei Jing
Abstract:
As the Internet of Things (IoT) continues to evolve, indoor location has become a critical element for enabling smart homes, behavioral monitoring, and elderly care. Existing WiFi-based human tracking solutions typically require specialized equipment or multiple Wi-Fi links, a limitation in most indoor settings where only a single pair of Wi-Fi devices is usually available. However, despite efforts to implement human tracking using one Wi-Fi link, significant challenges remain, such as difficulties in acquiring initial positions and blind spots in DFS estimation of tangent direction. To address these challenges, this paper proposes WPTrack, the first Wi-Fi and Pressure Insoles Fusion System for Single Target Tracking. WPTrack collects Channel State Information (CSI) from a single Wi-Fi link and pressure data from 90 insole sensors. The phase difference and Doppler velocity are computed from the CSI, while the pressure sensor data is used to calculate walking velocity. Then, we propose the CSI-pressure fusion model, integrating CSI and pressure data to accurately determine initial positions and facilitate precise human tracking. The simulation results show that the initial position localization accuracy ranges from 0.02 cm to 42.55 cm. The trajectory tracking results obtained from experimental data collected in a real-world environment closely align with the actual trajectory.
Authors:Kostiantyn Kucher, Niklas Rönnberg, Jonas Löwgren
Abstract:
Visualization is a heterogeneous field, and this aspect is often reflected by the organizational structures at higher education institutions that academic researchers in visualization and related fields including computer graphics, human-computer interaction, and media design are typically affiliated with. It may thus be a challenge for new PhD students to grasp the fragmented structure of their new workplace, form collegial relations across the institution, and to build a coherent picture of the discipline as a whole. We report an attempt to address this challenge, in the form of an introductory course on the subject of Visualization Technology and Methodology for PhD students at the Division for Media and Information Technology, Linköping University, Sweden. We discuss the course design, including interactions with other doctoral education activities and field trips to multiple research groups and units within the division (ranging from scientific visualization and computer graphics to media design and visual communication). Lessons learned from the course preparation work as well as the first instance of the course offered during autumn term 2023 can be helpful to researchers and educators aiming to establish or improve similar doctoral courses.
Authors:Liam Pram, Fabio Morreale
Abstract:
AI systems for music generation are increasingly common and easy to use, granting people without any musical background the ability to create music. Because of this, generative-AI has been marketed and celebrated as a means of democratizing music making. However, inclusivity often functions as marketable rhetoric rather than a genuine guiding principle in these industry settings. In this paper, we look at four generative-AI music making systems available to the public as of mid-2025 (AIVA, Stable Audio, Suno, and Udio) and track how they are rhetoricized by their developers, and received by users. Our aim is to investigate ideologies that are driving the early-stage development and adoption of generative-AI in music making, with a particular focus on democratization. A combination of autoethnography and digital ethnography is used to examine patterns and incongruities in rhetoric when positioned against product functionality. The results are then collated to develop a nuanced, contextual discussion. The shared ideology we map between producers and consumers is individualist, globalist, techno-liberal, and ethically evasive. It is a 'total ideology' which obfuscates individual responsibility, and through which the nature of music and musical practice is transfigured to suit generative outcomes.
Authors:Kazuki Komura, Kumi Ozaki, Seiji Yamada
Abstract:
This study investigated whether robotic agents that deal with social hierarchical relationships can reduce the dominance of superiors and equalize participation among participants in discussions with hierarchical structures. Thirty doctors and students having hierarchical relationship were gathered as participants, and an intervention experiment was conducted using a robot that can encourage participants to speak depending on social hierarchy. These were compared with strategies that intervened equally for all participants without considering hierarchy and with a no-action. The robots performed follow actions, showing backchanneling to speech, and encourage actions, prompting speech from members with less speaking time, on the basis of the hierarchical relationships among group members to equalize participation. The experimental results revealed that the robot's actions could potentially influence the speaking time among members, but it could not be conclusively stated that there were significant differences between the robot's action conditions. However, the results suggested that it might be possible to influence speaking time without decreasing the satisfaction of superiors. This indicates that in discussion scenarios where experienced superiors are likely to dominate, controlling the robot's backchanneling behavior could potentially suppress dominance and equalize participation among group members.
Authors:Mingyuan Zhong, Ajit Mallavarapu, Qing Nie
Abstract:
We present Caption, an LLM-powered content label generation tool for visual interactive elements on mobile devices. Content labels are essential for screen readers to provide announcements for image-based elements, but are often missing or uninformative due to developer neglect. Automated captioning systems attempt to address this, but are limited to on-screen context, often resulting in inaccurate or unspecific labels. To generate more accurate and descriptive labels, Caption collects next-screen context on interactive elements by navigating to the destination screen that appears after an interaction and incorporating information from both the origin and destination screens. Preliminary results show Caption generates more accurate labels than both human annotators and an LLM baseline. We expect Caption to empower developers by providing actionable accessibility suggestions and directly support on-demand repairs by screen reader users.
Authors:Kevin Zheng, Linda Huber, Aaron Stark, Nathan Kim, Francesca Lameiro, Wells Lucas Santo, Shreya Chowdhary, Eugene Kim, Justine Zhang
Abstract:
In the face of increasing austerity and threats of AI-enabled labor replacement at the University of Michigan, a group of workers and students have coalesced around the project of "AI resistance" since Fall 2024. Forming a cross-departmental coalition including librarians, faculty, staff, graduate workers, and undergraduate students, we have hosted a public workshop questioning the techno-deterministic inevitability of AI use at the University and are working with other campus organizations to maintain an ongoing organizing space. This workshop submission incorporates our reflections thus far on the strategies we've employed, the challenges to collective resistance, and our role as workers in resisting AI within the University. Our aim for this work is to provide concrete inspiration for technologists, students, and staff looking to resist AI techno-solutionism within their own universities.
Authors:Victoria Williams, Benjamin Rosman
Abstract:
Large language models have become increasingly common, used by millions of people worldwide in both professional and personal contexts. As these models continue to advance, they are frequently serving as virtual assistants and companions. In human interactions, effective communication typically involves two types of empathy: cognitive empathy (understanding others' thoughts and emotions) and affective empathy (emotionally sharing others' feelings). In this study, we investigated both cognitive and affective empathy across several small (SLMs) and large (LLMs) language models using standardized psychological tests. Our results revealed that LLMs consistently outperformed humans - including psychology students - on cognitive empathy tasks. However, despite their cognitive strengths, both small and large language models showed significantly lower affective empathy compared to human participants. These findings highlight rapid advancements in language models' ability to simulate cognitive empathy, suggesting strong potential for providing effective virtual companionship and personalized emotional support. Additionally, their high cognitive yet lower affective empathy allows objective and consistent emotional support without running the risk of emotional fatigue or bias.
Authors:Vladimir Zhurov, John Kausch, Kamran Sedig, Mostafa Milani
Abstract:
Ontologies play a central role in structuring knowledge across domains, supporting tasks such as reasoning, data integration, and semantic search. However, their large size and complexity, particularly in fields such as biomedicine, computational biology, law, and engineering, make them difficult for non-experts to navigate. Formal query languages such as SPARQL offer expressive access but require users to understand the ontology's structure and syntax. In contrast, visual exploration tools and basic keyword-based search interfaces are easier to use but often lack flexibility and expressiveness. We introduce FuzzyVis, a proof-of-concept system that enables intuitive and expressive exploration of complex ontologies. FuzzyVis integrates two key components: a fuzzy logic-based querying model built on fuzzy ontology embeddings, and an interactive visual interface for building and interpreting queries. Users can construct new composite concepts by selecting and combining existing ontology concepts using logical operators such as conjunction, disjunction, and negation. These composite concepts are matched against the ontology using fuzzy membership-based embeddings, which capture degrees of membership and support approximate, concept-level similarity search. The visual interface supports browsing, query composition, and partial search without requiring formal syntax. By combining fuzzy semantics with embedding-based reasoning, FuzzyVis enables flexible interpretation, efficient computation, and exploratory learning. Case studies demonstrate how FuzzyVis supports subtle information needs and helps users uncover relevant concepts in large, complex ontologies.
Authors:Yeana Lee Bond, Mungyeong Choe, Baker Kasim Hasan, Arsh Siddiqui, Myounghoon Jeon
Abstract:
Studies on in-vehicle conversational agents have traditionally relied on pre-scripted prompts or limited voice commands, constraining natural driver-agent interaction. To resolve this issue, the present study explored the potential of a ChatGPT-based in-vehicle agent capable of carrying continuous, multi-turn dialogues. Forty drivers participated in our experiment using a motion-based driving simulator, comparing three conditions (No agent, Pre-scripted agent, and ChatGPT-based agent) as a within-subjects variable. Results showed that the ChatGPT-based agent condition led to more stable driving performance across multiple metrics. Participants demonstrated lower variability in longitudinal acceleration, lateral acceleration, and lane deviation compared to the other two conditions. In subjective evaluations, the ChatGPT-based agent also received significantly higher ratings in competence, animacy, affective trust, and preference compared to the Pre-scripted agent. Our thematic analysis of driver-agent conversations revealed diverse interaction patterns in topics, including driving assistance/questions, entertainment requests, and anthropomorphic interactions. Our results highlight the potential of LLM-powered in-vehicle conversational agents to enhance driving safety and user experience through natural, context-rich interactions.
Authors:Shuo Han, Ahmed Karam Eldaly, Solomon Sunday Oyelere
Abstract:
Invasive ductal carcinoma (IDC) is the most prevalent form of breast cancer, and early, accurate diagnosis is critical to improving patient survival rates by guiding treatment decisions. Combining medical expertise with artificial intelligence (AI) holds significant promise for enhancing the precision and efficiency of IDC detection. In this work, we propose a human-in-the-loop (HITL) deep learning system designed to detect IDC in histopathology images. The system begins with an initial diagnosis provided by a high-performance EfficientNetV2S model, offering feedback from AI to the human expert. Medical professionals then review the AI-generated results, correct any misclassified images, and integrate the revised labels into the training dataset, forming a feedback loop from the human back to the AI. This iterative process refines the model's performance over time. The EfficientNetV2S model itself achieves state-of-the-art performance compared to existing methods in the literature, with an overall accuracy of 93.65\%. Incorporating the human-in-the-loop system further improves the model's accuracy using four experimental groups with misclassified images. These results demonstrate the potential of this collaborative approach to enhance AI performance in diagnostic systems. This work contributes to advancing automated, efficient, and highly accurate methods for IDC detection through human-AI collaboration, offering a promising direction for future AI-assisted medical diagnostics.
Authors:Valerie Tan, Jens Gerken
Abstract:
In this position paper, we discuss symptoms of attention deficit hyperactivity disorder (ADHD) in adults, as well as available forms of treatment or assistance in the context of mixed reality. Mixed reality offers many potentials for assisting adults with symptoms commonly found in (but not limited to) ADHD, but the availability of mixed reality solutions is not only limited commercially, but also limited in terms of proof-of-concept prototypes. We discuss two major challenges with attention assistance using mixed reality solutions: the limited availability of adult-specific prototypes and studies, as well as the limited number of solutions that offer continuous intervention of ADHD-like symptoms that users can employ in their daily life.
Authors:Parth G. Dangi, Yogesh Kumar Meena
Abstract:
Brain-machine interfaces (BMIs) have significantly advanced neuro-rehabilitation by enhancing motor control. However, accurately decoding continuous grasp force remains a challenge, limiting the effectiveness of BMI applications for fine motor tasks. Current models tend to prioritise algorithmic complexity rather than incorporating neurophysiological insights into force control, which is essential for developing effective neural engineering solutions. To address this, we propose EEGForceMap, an EEG-based methodology that isolates signals from the premotor-parietal region and extracts task-specific components. We construct three distinct time-frequency feature sets, which are validated by comparing them with prior studies, and use them for force prediction with linear, non-linear, and deep learning-based regressors. The performance of these regressors was evaluated on the WAY-EEG-GAL dataset that includes 12 subjects. Our results show that integrating EEGForceMap approach with regressor models yields a 61.7% improvement in subject-specific conditions (R-squared = 0.815) and a 55.7% improvement in subject-independent conditions (R-squared = 0.785) over the state-of-the-art kinematic decoder models. Furthermore, an ablation study confirms that each preprocessing step significantly enhances decoding accuracy. This work contributes to the advancement of responsive BMIs for stroke rehabilitation and assistive robotics by improving EEG-based decoding of dynamic grasp force.
Authors:Andrés Eduardo Fuentes-Cortázar, Alejandra Rivera-Hernández, José Rafael Rojano-Cáceres
Abstract:
Providing an equitable and inclusive user experience (UX) for people with disabilities (PWD) is a central goal of accessible design. In the specific case of Deaf users, whose hearing impairments impact language development and communication, it is essential to consider their specific needs during software evaluation processes. This study aimed to analyze a set of UX evaluation methods suggested in the literature as suitable for Deaf individuals, with the goal of validating their level of accessibility in real-world contexts. The research was based on a critical review and practical application of these methods, identifying their strengths and limitations in relation to the interaction, perception, and comprehension of Deaf users. Traditional evaluation instruments, commonly designed for hearing individuals, pose significant barriers when applied to Deaf users due to their re-liance on auditory and cognitive abilities, as well as the lack of consideration for commu-nicational accessibility. The results show that although these methods are frequently rec-ommended, they exhibit critical shortcomings that hinder the collection of accurate and representative data. It is concluded that it is essential to adapt UX evaluation methods to ensure genuinely accessible processes that address the communicative and cognitive needs of the Deaf community and accurately reflect their user experience.
Authors:Joseph T. Colonel, Baihan Lin
Abstract:
Word clouds are a common way to summarize qualitative interviews, yet traditional frequency-based methods often fail in conversational contexts: they surface filler words, ignore paraphrase, and fragment semantically related ideas. This limits their usefulness in early-stage analysis, when researchers need fast, interpretable overviews of what participant actually said. We introduce ThemeClouds, an open-source visualization tool that uses large language models (LLMs) to generate thematic, participant-weighted word clouds from dialogue transcripts. The system prompts an LLM to identify concept-level themes across a corpus and then counts how many unique participants mention each topic, yielding a visualization grounded in breadth of mention rather than raw term frequency. Researchers can customize prompts and visualization parameters, providing transparency and control. Using interviews from a user study comparing five recording-device configurations (31 participants; 155 transcripts, Whisper ASR), our approach surfaces more actionable device concerns than frequency clouds and topic-modeling baselines (e.g., LDA, BERTopic). We discuss design trade-offs for integrating LLM assistance into qualitative workflows, implications for interpretability and researcher agency, and opportunities for interactive analyses such as per-condition contrasts (``diff clouds'').
Authors:Sanjana Srabanti, G. Elisabeta Marai, Fabio Miranda
Abstract:
The visualization and analysis of street and pedestrian networks are important to various domain experts, including urban planners, climate researchers, and health experts. This has led to the development of new techniques for street and pedestrian network visualization, expanding how data can be shown and understood more effectively. Despite their increasing adoption, there is no established design framework to guide the creation of these visualizations while addressing the diverse requirements of various domains. When exploring a feature of interest, domain experts often need to transform, integrate, and visualize a combination of thematic data (e.g., demographic, socioeconomic, pollution) and physical data (e.g., zip codes, street networks), often spanning multiple spatial and temporal scales. This not only complicates the process of visual data exploration and system implementation for developers but also creates significant entry barriers for experts who lack a background in programming. With this in mind, in this paper, we reviewed 45 studies utilizing street-overlaid visualizations to understand how they are used. Through qualitative coding of these visualizations, we analyzed three key aspects of street and pedestrian network visualization usage: the analytical purpose they serve, the visualization approaches employed, and the data sources used in their creation. Building on this design space, we introduce StreetWeave, a declarative grammar for designing custom visualizations of multivariate spatial network data across multiple resolutions. We demonstrate how StreetWeave can be used to create various street-overlaid visualizations, enabling effective exploration and analysis of spatial data. StreetWeave is available at https://urbantk.org/streetweave.
Authors:Mallory Knodel, Mallika Balakrishnan, Lauren M. Chambers
Abstract:
The label `public interest technology' (PIT) is growing in popularity among those seeking to use `tech for good' - especially among technical practitioners working in civil society and nonprofit organizations. PIT encompasses a broad range of sociotechnical work across professional domains and sectors; however, the trend remains understudied within sociotechnical research. This paper describes a mixed-methods study, designed and conducted by PIT practitioners at the Center for Democracy and Technology, that characterizes technologists within the specific context of civil society, civil rights, and advocacy organizations in North America and Western Europe. We conducted interviews with civil society leaders to investigate how PIT practitioners position the field and themselves, and we held a roundtable discussion bringing diverse voices together to make meaning of this growing phenomenon. Ultimately, we find that PIT remains both defined and plagued by its expansiveness, and that today's civil society public interest technologists see a need for both (a) more robust professionalization infrastructures, including philanthropic attention, and (b) more engaged, coherent community. This study illuminates a nascent intersection of technology and policy on-the-ground that is of growing relevance to critical sociotechnical research on the shifting relationship between computing and society.
Authors:Paul C. Parsons, Prakash Chandra Shukla
Abstract:
Visualization design is often described as the process of solving a well-defined problem by navigating a design space. While existing visualization design models have provided valuable structure and guidance, they tend to foreground technical problem-solving and underemphasize the interpretive, judgment-based aspects of design. In contrast, research in other design disciplines has emphasized the importance of framing--how designers define and redefine what the problem is--and the co-evolution of problem and solution spaces through reflective practice. These dimensions remain underexplored in visualization research, particularly from the perspective of expert practitioners. This paper investigates how visualization designers frame problems and navigate the dynamic interplay between problem understanding and solution development. We conducted a mixed-methods study with 11 expert practitioners using design challenges, diary entries, and semi-structured interviews. Through reflexive thematic analysis, we identified key strategies that participants used to frame problems, reframe them in response to evolving constraints or insights, and build bridges between problem and solution spaces. These included using metaphors, heuristics, sketching, primary generators, and reflective evaluation of failed or incomplete ideas. Our findings contribute an empirically grounded account of visualization design as a reflective, co-evolutionary practice, where framing is not a preliminary step but a continuous activity embedded in design. Participants often reshaped their understanding of the problem based on solution attempts, tool feedback, and ethical or narrative concerns. These insights extend current visualization design models and highlight the need for frameworks that better account for framing and interpretive judgment. (See paper for full abstract.)
Authors:Kyuwon Kim, Jaeryeong Hwang, Younseo Lee, Jeanhee Lee, Sung-Eun Kim, Hyo-Jeong So
Abstract:
As complex societal issues continue to emerge, fostering democratic skills like valuing diverse perspectives and collaborative decision-making is increasingly vital in education. In this paper, we propose a Peer Agent (PA) system designed to simulate a deliberative conversational partner that induces socio-cognitive conflict within dilemma-based game play. Drawing on by the Inner Thoughts framework and grounded in value-sensitive discourse analysis, the PA actively participates in voice-based multi-party deliberation with human players. The system architecture consists of five core modules: Context Interpreter, Agent State Manager, Thought Generator, Thought Evaluator, and Thought Articulator.
Authors:Danyang Fan, Walker Smith, Takako Fujioka, Chris Chafe, Sile O'Modhrain, Diana Deutsch, Sean Follmer
Abstract:
Sonification offers a non-visual way to understand data, with pitch-based encodings being the most common. Yet, how well people perceive slope and acceleration-key features of data trends-remains poorly understood. Drawing on people's natural abilities to perceive tempo, we introduce a novel sampling method for pitch-based sonification to enhance the perception of slope and acceleration in univariate functions. While traditional sonification methods often sample data at uniform x-spacing, yielding notes played at a fixed tempo with variable pitch intervals (Variable Pitch Interval), our approach samples at uniform y-spacing, producing notes with consistent pitch intervals but variable tempo (Variable Tempo). We conducted psychoacoustic experiments to understand slope and acceleration perception across three sampling methods: Variable Pitch Interval, Variable Tempo, and a Continuous (no sampling) baseline. In slope comparison tasks, Variable Tempo was more accurate than the other methods when modulated by the magnitude ratio between slopes. For acceleration perception, just-noticeable differences under Variable Tempo were over 13 times finer than with other methods. Participants also commonly reported higher confidence, lower mental effort, and a stronger preference for Variable Tempo compared to other methods. This work contributes models of slope and acceleration perception across pitch-based sonification techniques, introduces Variable Tempo as a novel and preferred sampling method, and provides promising initial evidence that leveraging timing can lead to more sensitive, accurate, and precise interpretation of derivative-based data features.
Authors:Deógenes P. da Silva Junior, Jonas Lopes Guerra, Krissia Menezes, Marisa Sel Franco, Roberto Pereira
Abstract:
This report presents the results of an exploratory analysis of the work context of Community Health Agents and Endemic Disease Control Agents in Primary Health Care (PHC), with a particular focus on Health Campaigns. To understand this context, the study adopted the Socially Aware Design framework, which employs artifacts and techniques to examine problem domains in a comprehensive and sociotechnical manner. Methods such as the Stakeholder Identification Diagram, Evaluation Frame, and Semiotic Framework were applied to identify stakeholders, anticipate challenges, and elicit social and technical requirements for the solution. Personas and Scenarios were also used to illustrate the potential impacts of a solution on various stakeholders and their life contexts within health campaigns. This report presents the analysis method, its application, and results, discussing the study's findings to inform the development of medium-fidelity prototypes for a PHC health campaign management solution.
Authors:Alex Escalante Viteri, Javier Gamboa Cruzado, Leonidas Asto Huaman
Abstract:
Business intelligence in the banking industry has been studied extensively in the last decade; however, business executives still do not perceive efficiency in the decision-making process since the management and treatment of information are very timeconsuming for the deliverer, generating costs in the process. On the other hand, there is no formal methodology for developing business intelligence solutions in this sector. This work aims to optimize decision-making in a business unit that works with internet banking companies, reducing the time, the number of people, and the costs involved in decision-making. To meet the objective, basic and applied research was conducted. The basic research allowed the construction of a new methodology from a study of critical success factors and approaches from the business intelligence literature. The applied research involved the implementation of a business intelligence solution applying the new methodology in a pre-experimental study. Thirty decision-making processes were analyzed using pre-test and post-test data. Tools such as a stopwatch and observation were used to collect and record data on time spent, the number of people, and the decision-making costs. This information was processed in the specialized Minitab18 statistical software, which allowed the observation and confirmation of relevant results regarding time reduction, the number of people, and the costs generated. Therefore, it was concluded that the business intelligence solution, applying the new methodology, optimized decision making in the business unit that works with internet banking for companies.
Authors:Yuvraj Virk, Dongyu Liu
Abstract:
Non-technical end-users increasingly rely on AI code generation to perform technical tasks like data analysis. However, large language models (LLMs) remain unreliable, and it is unclear whether end-users can effectively identify model errors $\unicode{x2014}$ especially in realistic and domain-specific scenarios. We surveyed marketing and sales professionals to assess their ability to critically evaluate LLM-generated analyses of marketing data. Participants were shown natural language explanations of the AI's code, repeatedly informed the AI often makes mistakes, and explicitly prompted to identify them. Yet, participants frequently failed to detect critical flaws that could compromise decision-making, many of which required no technical knowledge to recognize. To investigate why, we reformatted AI responses into clearly delineated steps and provided alternative approaches for each decision to support critical evaluation. While these changes had a positive effect, participants often struggled to reason through the AI's steps and alternatives. Our findings suggest that business professionals cannot reliably verify AI-generated data analyses on their own and explore reasons why to inform future designs. As non-programmers adopt code-generating AI for technical tasks, unreliable AI and insufficient human oversight poses risks of unsafe or low-quality decisions.
Authors:Péter Mihajlik, Ãva Székely, Piroska Barta, Máté Soma Kádár, Gergely Dobsinszki, László Tóth
Abstract:
We present a case study on developing a customized speech-to-text system for a Hungarian speaker with severe dysarthria. State-of-the-art automatic speech recognition (ASR) models struggle with zero-shot transcription of dysarthric speech, yielding high error rates. To improve performance with limited real dysarthric data, we fine-tune an ASR model using synthetic speech generated via a personalized text-to-speech (TTS) system. We introduce a method for generating synthetic dysarthric speech with controlled severity by leveraging premorbidity recordings of the given speaker and speaker embedding interpolation, enabling ASR fine-tuning on a continuum of impairments. Fine-tuning on both real and synthetic dysarthric speech reduces the character error rate (CER) from 36-51% (zero-shot) to 7.3%. Our monolingual FastConformer_Hu ASR model significantly outperforms Whisper-turbo when fine-tuned on the same data, and the inclusion of synthetic speech contributes to an 18% relative CER reduction. These results highlight the potential of personalized ASR systems for improving accessibility for individuals with severe speech impairments.
Authors:Christian Meske, Justin Brenne, Erdi Uenal, Sabahat Oelcer, Ayseguel Doganguen
Abstract:
Current explainable AI (XAI) approaches prioritize algorithmic transparency and present explanations in abstract, non-adaptive formats that often fail to support meaningful end-user understanding. This paper introduces "Explanatory AI" as a complementary paradigm that leverages generative AI capabilities to serve as explanatory partners for human understanding rather than providers of algorithmic transparency. While XAI reveals algorithmic decision processes for model validation, Explanatory AI addresses contextual reasoning to support human decision-making in sociotechnical contexts. We develop a definition and systematic eight-dimensional conceptual model distinguishing Explanatory AI through narrative communication, adaptive personalization, and progressive disclosure principles. Empirical validation through Rapid Contextual Design methodology with healthcare professionals demonstrates that users consistently prefer context-sensitive, multimodal explanations over technical transparency. Our findings reveal the practical urgency for AI systems designed for human comprehension rather than algorithmic introspection, establishing a comprehensive research agenda for advancing user-centered AI explanation approaches across diverse domains and cultural contexts.
Authors:Durjoy Chandra Paul, Gaurob Saha, Md Amjad Hossain
Abstract:
Recognizing emotional signals in speech has a significant impact on enhancing the effectiveness of human-computer interaction (HCI). This study introduces EmoAugNet, a hybrid deep learning framework, that incorporates Long Short-Term Memory (LSTM) layers with one-dimensional Convolutional Neural Networks (1D-CNN) to enable reliable Speech Emotion Recognition (SER). The quality and variety of the features that are taken from speech signals have a significant impact on how well SER systems perform. A comprehensive speech data augmentation strategy was used to combine both traditional methods, such as noise addition, pitch shifting, and time stretching, with a novel combination-based augmentation pipeline to enhance generalization and reduce overfitting. Each audio sample was transformed into a high-dimensional feature vector using root mean square energy (RMSE), Mel-frequency Cepstral Coefficient (MFCC), and zero-crossing rate (ZCR). Our model with ReLU activation has a weighted accuracy of 95.78\% and unweighted accuracy of 92.52\% on the IEMOCAP dataset and, with ELU activation, has a weighted accuracy of 96.75\% and unweighted accuracy of 91.28\%. On the RAVDESS dataset, we get a weighted accuracy of 94.53\% and 94.98\% unweighted accuracy for ReLU activation and 93.72\% weighted accuracy and 94.64\% unweighted accuracy for ELU activation. These results highlight EmoAugNet's effectiveness in improving the robustness and performance of SER systems through integated data augmentation and hybrid modeling.
Authors:Xinming Yang, Haasil Pujara, Jun Li
Abstract:
While Large Language Models (LLMs) are often used as virtual tutors in computer science (CS) education, this approach can foster passive learning and over-reliance. This paper presents a novel pedagogical paradigm that inverts this model: students act as instructors who must teach an LLM to solve problems. To facilitate this, we developed strategies for designing questions with engineered knowledge gaps that only a student can bridge, and we introduce Socrates, a system for deploying this method with minimal overhead. We evaluated our approach in an undergraduate course and found that this active-learning method led to statistically significant improvements in student performance compared to historical cohorts. Our work demonstrates a practical, cost-effective framework for using LLMs to deepen student engagement and mastery.
Authors:Stefan Pasch, Min Chul Cha
Abstract:
As AI systems become increasingly embedded in organizational workflows and consumer applications, ethical principles such as fairness, transparency, and robustness have been widely endorsed in policy and industry guidelines. However, there is still scarce empirical evidence on whether these principles are recognized, valued, or impactful from the perspective of users. This study investigates the link between ethical AI and user satisfaction by analyzing over 100,000 user reviews of AI products from G2. Using transformer-based language models, we measure sentiment across seven ethical dimensions defined by the EU Ethics Guidelines for Trustworthy AI. Our findings show that all seven dimensions are positively associated with user satisfaction. Yet, this relationship varies systematically across user and product types. Technical users and reviewers of AI development platforms more frequently discuss system-level concerns (e.g., transparency, data governance), while non-technical users and reviewers of end-user applications emphasize human-centric dimensions (e.g., human agency, societal well-being). Moreover, the association between ethical AI and user satisfaction is significantly stronger for non-technical users and end-user applications across all dimensions. Our results highlight the importance of ethical AI design from users' perspectives and underscore the need to account for contextual differences across user roles and product types.
Authors:Jules Clerc, Domitile Lourdeaux, Mohamed Sallak, Johann Barbier, Marc Ravaine
Abstract:
Interactive Narrative Systems (INS) have revolutionized digital experiences by empowering users to actively shape their stories, diverging from traditional passive storytelling. However, the field faces challenges due to fragmented research efforts and diverse system representations. This paper introduces a formal representation framework for INS, inspired by diverse approaches from the state of the art. By providing a consistent vocabulary and modeling structure, the framework facilitates the analysis, the description and comparison of INS properties. Experimental validations on the "Little Red Riding Hood" scenario highlight the usefulness of the proposed formalism and its impact on improving the evaluation of INS. This work aims to foster collaboration and coherence within the INS research community by proposing a methodology for formally representing these systems.
Authors:Ahmed Abdal Shafi Rasel, Ahmed Mustafa Amlan, Tasmim Shajahan Mim, Tanvir Hasan
Abstract:
This study explores perceptions of fairness in algorithmic decision-making among users in Bangladesh through a comprehensive mixed-methods approach. By integrating quantitative survey data with qualitative interview insights, we examine how cultural, social, and contextual factors influence users' understanding of fairness, transparency, and accountability in AI systems. Our findings reveal nuanced attitudes toward human oversight, explanation mechanisms, and contestability, highlighting the importance of culturally aware design principles for equitable and trustworthy algorithmic systems. These insights contribute to ongoing discussions on algorithmic fairness by foregrounding perspectives from a non-Western context, thus broadening the global dialogue on ethical AI deployment.
Authors:Margarida Romero, George Kalmpourtzis
Abstract:
Metacognition is an important aspect in creative problem solving (CPS) and through this chapter we analyse the meta-reasoning aspects applied in the different processes of monitoring the progress of learners' reasoning and CPS activities. Meta-reasoning monitors the way that problem-solving processes advance and regulate time and efforts towards a solution. In the context of an ill-defined problem, exploration is required to develop a better-defined problem space and advance towards the solution space. The way learners engage in exploration and exploitations is regulated by the meta-reasoning within the CPS activity. The objective of this chapter is to examine and identify the CPS process with educational robots through a metacognitive and interactionist approach. This chapter presents a case study, where, to solve a problem, a participant had to explore a set of robot cubes to develop the technological knowledge associated with each single component of the system, but also conceptualize a system-level behaviour of the cubes when they are assembled. The chapter presents the emergence of knowledge through the metacognitive regulation of the process of exploration and exploitation of prior knowledge and emergent knowledge until finding a solution
Authors:Anand Kumar, Antony Albert Raj Irudayaraj, Ishita Chandra, Adwait Sharma, Aditya Shekhar Nittala
Abstract:
Gesture recognition with electromyography (EMG) is a complex problem influenced by gesture sets, electrode count and placement, and machine learning parameters (e.g., features, classifiers). Most existing toolkits focus on streamlining model development but overlook the impact of electrode selection on classification accuracy. In this work, we present the first data-driven analysis of how electrode selection and classifier choice affect both accuracy and sparsity. Through a systematic evaluation of 28 combinations (4 selection schemes, 7 classifiers), across six datasets, we identify an approach that minimizes electrode count without compromising accuracy. The results show that Permutation Importance (selection scheme) with Random Forest (classifier) reduces the number of electrodes by 53.5\%. Based on these findings, we introduce SparseEMG, a design tool that generates sparse electrode layouts based on user-selected gesture sets, electrode constraints, and ML parameters while also predicting classification performance. SparseEMG supports 50+ unique gestures and is validated in three real-world applications using different hardware setups. Results from our multi-dataset evaluation show that the layouts generated from the SparseEMG design tool are transferable across users with only minimal variation in gesture recognition performance.
Authors:Sam Johnson-Lacoss, Santiago V. Lombeyda, S. George Djorgovski
Abstract:
Mixed reality (MR) environments are bound to become ubiquitous as MR technology becomes lighter, higher resolution, more affordable, and overall becomes a seamless extension of our current work and living spaces. For research scientists and clinicians focused on understanding 3D phenomena or patient pathologies within the context of the larger human anatomy, that means a necessary evolution of their workstations currently only utilizing 2D interfaces for everyday communication, logistics and data analysis. MR technologies bring forth immersive 3D representations coexisting in our natural spaces, while allowing for richer interconnected information displays, where 3D representations greatly aid in the detailed understanding of physical structures, spatial relationships, and 3D contextualization of 2D measurements, projections, abstractions, and other data details. We present a breakdown of the different interaction zones and modalities into a design space that best accommodates the creation of applications for users engaged through MR technologies in precise object-centric data analysis within the ergonomic confines of their desktop physical spaces.
Authors:Vaanee Tripathi, Aalok Thakkar
Abstract:
Computer science education has evolved extensively; however, systemic barriers still prevent students with visual impairments from fully participating. While existing research has developed specialized programming tools and assistive technologies, these solutions remain fragmented and often require complex technical infrastructure, which limits their classroom implementation. Current approaches treat accessibility as individual accommodations rather than integral curriculum design, creating gaps in holistic educational support. This paper presents a comprehensive framework for redesigning introductory computer science curricula to provide equitable learning experiences for students with visual impairments without requiring specialized technical infrastructure. The framework outlines five key components that together contribute a systematic approach to curriculum accessibility: accessible learning resources with pre-distributed materials and tactile diagrams, in-class learning kits with hands-on demonstrations, structured support systems with dedicated teaching assistance, an online tool repository, and psychosocial support for classroom participation. Unlike existing tool-focused solutions, this framework addresses both technical and pedagogical dimensions of inclusive education while emphasizing practical implementation in standard university settings. The design is grounded in universal design principles and validated through expert consultation with accessibility specialists and disability services professionals, establishing foundations for future empirical evaluation of learning outcomes and student engagement while serving as a template for broader institutional adoption.
Authors:Yuqi Hu, Qiwen Xiong, Zhenzhen Qin, Brandon Watanabe, Yujing Wang, Mirjana Prpa, Ilmi Yoon
Abstract:
Root Cause Analysis (RCA) is a critical tool for investigating adverse events in healthcare and improving patient safety. However, existing RCA training programs are often limited by high resource demands, leading to insufficient training and inconsistent implementation. To address this challenge, we present an AI-powered 3D simulation game that helps healthcare professionals develop RCA skills through interactive, immersive simulations. This approach offers a cost-effective, scalable, and accessible alternative to traditional training. The prototype simulates an RCA investigation following a death in the ICU, where learners interview five virtual avatars representing ICU team members to investigate the incident and complete a written report. The system enables natural, life-like interactions with avatars via large language models (LLMs), emotional text-to-speech, and AI-powered animations. An additional LLM component provides formative and summative feedback to support continual improvement. We conclude by outlining plans to empirically evaluate the system's efficacy.
Authors:Theia Henderson, David R. Karger, David D. Clark
Abstract:
Most social applications, from Twitter to Wikipedia, have rigid one-size-fits-all designs, but building new social applications is both technically challenging and results in applications that are siloed away from existing communities. We present Graffiti, a system that can be used to build a wide variety of personalized social applications with relative ease that also interoperate with each other. People can freely move between a plurality of designs -- each with its own aesthetic, feature set, and moderation -- all without losing their friends or data.
Our concept of total reification makes it possible for seemingly contradictory designs, including conflicting moderation rules, to interoperate. Conversely, our concept of channels prevents interoperation from occurring by accident, avoiding context collapse.
Graffiti applications interact through a minimal client-side API, which we show admits at least two decentralized implementations. Above the API, we built a Vue plugin, which we use to develop applications similar to Twitter, Messenger, and Wikipedia using only client-side code. Our case studies explore how these and other novel applications interoperate, as well as the broader ecosystem that Graffiti enables.
Authors:Vishnu Menon, Andy Cherney, Elizabeth B. Cloude, Li Zhang, Tiffany D. Do
Abstract:
This study examined whether embedding LLM-guided reflection prompts in an interactive AI-generated podcast improved learning and user experience compared to a version without prompts. Thirty-six undergraduates participated, and while learning outcomes were similar across conditions, reflection prompts reduced perceived attractiveness, highlighting a call for more research on reflective interactivity design.
Authors:Natalia Echeverry, Arun Lekshmi Narayanan
Abstract:
A survey of 26 CS students reveals that AI coding assistants are mainly used for writing code (second to online searches) while AI chatbots are the top resource for debugging. Participants with different coding experience prefer online help over direct human help from peers and instructors.
Authors:Zhu Yuting, Cao Xinyu, Su Yuzhuo, Ma Yongbin
Abstract:
A common challenge for e-commerce sellers is to decide what product images to display on online shopping sites. In this paper, we propose and validate a novel metric, k-value, to quantify the information richness of an image set, and we further investigate its effect on consumers' purchase decisions. We leverage patch-level embeddings from Vision Transformers (ViT) and apply k-means clustering to identify distinct visual features, defining k-value as the number of clusters. An online experiment demonstrates that k-value aligns with human-perceived information richness, validating the metric. A simulated online shopping experiment further reveals a significant yet counterintuitive result: while an image set with a higher k-value (richer information) shortens decision time, it paradoxically reduces purchase propensity. Our findings illuminate the complex relationship between visual information richness and consumer behavior, providing sellers a quantifiable tool for image selection.
Authors:Thassilo M. Schiepanski, Nicholas Piël
Abstract:
Frontier LLMs only recently enabled serviceable, autonomous web agents. At that, a model poses as an instantaneous domain model backend. Ought to suggest interaction, it is consulted with a web-based task and respective application state. The key problem lies in application state serialisation $\unicode{x2013}$ referred to as snapshot. State-of-the-art web agents are premised on grounded GUI snapshots, i.e., screenshots enhanced with visual cues. Not least to resemble human perception, but for images representing relatively cheap means of model input. LLM vision still lag behind code interpretation capabilities. DOM snapshots, which structurally resemble HTML, impose a desired alternative. Vast model input token size, however, disables reliable implementation with web agents to date.
We propose D2Snap, a first-of-its-kind DOM downsampling algorithm. Based on a GPT-4o backend, we evaluate D2Snap on tasks sampled from the Online-Mind2Web dataset. The success rate of D2Snap-downsampled DOM snapshots (67%) matches a grounded GUI snapshot baseline (65%) $\unicode{x2013}$ within the same input token order of magnitude (1e3). Our best evaluated configurations $\unicode{x2013}$ one token order above, but within the model's context window $\unicode{x2013}$ outperform this baseline by 8%. Our evaluation, moreover, yields that DOM-inherent hierarchy embodies a strong UI feature for LLMs.
Authors:Carlos Andrés RamÃrez Cataño, Makoto Itoh
Abstract:
Software defect prediction using code metrics has been extensively researched over the past five decades. However, prediction harnessing non-software metrics is under-researched. Considering that the root cause of software defects is often attributed to human error, human factors theory might offer key forecasting metrics for actionable insights. This paper explores automated software defect prediction at the method level based on the developers' coding habits. First, we propose a framework for deciding the metrics to conduct predictions. Next, we compare the performance of our metrics to that of the code and commit history metrics shown by research to achieve the highest performance to date. Finally, we analyze the prediction importance of each metric. As a result of our analyses of twenty-one critical infrastructure large-scale open-source software projects, we have presented: (1) a human error-based framework with metrics useful for defect prediction at method level; (2) models using our proposed metrics achieve better average prediction performance than the state-of-the-art code metrics and history measures; (3) the prediction importance of all metrics distributes differently with each of the novel metrics having better average importance than code and history metrics; (4) the novel metrics dramatically enhance the explainability, practicality, and actionability of software defect prediction models, significantly advancing the field. We present a systematic approach to forecasting defect-prone software methods via a human error framework. This work empowers practitioners to act on predictions, empirically demonstrating how developer coding habits contribute to defects in software systems.
Authors:Muhammad Akmal Bin Mohammed Zaffir, Daisuke Sakai, Yuki Sato, Takahiro Wada
Abstract:
Studies suggest that involuntary eye movements exhibit greater stability during active motion compared to passive motion, and this effect may also apply to the operation of ride-on machinery. Moreover, a study suggested that experimentally manipulating the sense of agency (SoA) by introducing delays may influence the stability of involuntary eye movements. Although a preliminary investigation examined involuntary eye movements and perceived maneuverability under two distinct machine dynamics with preserved SoA, it remains unclear how systematic variations in motion dynamics influence these factors. Therefore, the purpose of the present research was to investigate whether systematic variations in the dynamic properties of a ride-on machine, where the perceived maneuverability is modulated, influence the accuracy of involuntary eye movements in human operators. Participants rode a yaw-rotational platform whose time constant from joystick input to motor torque of a rotational machine was systematically manipulated. During the operation, eye movements were recorded while participants fixated on a visual target. After each condition, participants provided subjective ratings of maneuverability and cognitive load. As the platform's time constant increased, the perceived maneuverability scores decreased while the cognitive loads increased. Concurrently, involuntary eye movement accuracy decreased. Moderate to weak positive correlations emerged between the perceived maneuverability scores and the eye movement gain and accuracy, while a weak negative correlation was found with cognitive load.
Authors:Bertram Fuchs, Mehdi Ejtehadi, Ana Cisnal, Jürgen Pannek, Anke Scheel-Sailer, Robert Riener, Inge Eriks-Hoogland, Diego Paez-Granados
Abstract:
Autonomic Dysreflexia (AD) is a potentially life-threatening condition characterized by sudden, severe blood pressure (BP) spikes in individuals with spinal cord injury (SCI). Early, accurate detection is essential to prevent cardiovascular complications, yet current monitoring methods are either invasive or rely on subjective symptom reporting, limiting applicability in daily file. This study presents a non-invasive, explainable machine learning framework for detecting AD using multimodal wearable sensors. Data were collected from 27 individuals with chronic SCI during urodynamic studies, including electrocardiography (ECG), photoplethysmography (PPG), bioimpedance (BioZ), temperature, respiratory rate (RR), and heart rate (HR), across three commercial devices. Objective AD labels were derived from synchronized cuff-based BP measurements. Following signal preprocessing and feature extraction, BorutaSHAP was used for robust feature selection, and SHAP values for explainability. We trained modality- and device-specific weak learners and aggregated them using a stacked ensemble meta-model. Cross-validation was stratified by participants to ensure generalizability. HR- and ECG-derived features were identified as the most informative, particularly those capturing rhythm morphology and variability. The Nearest Centroid ensemble yielded the highest performance (Macro F1 = 0.77+/-0.03), significantly outperforming baseline models. Among modalities, HR achieved the highest area under the curve (AUC = 0.93), followed by ECG (0.88) and PPG (0.86). RR and temperature features contributed less to overall accuracy, consistent with missing data and low specificity. The model proved robust to sensor dropout and aligned well with clinical AD events. These results represent an important step toward personalized, real-time monitoring for individuals with SCI.
Authors:Shengnan Yang, Rongqian Ma
Abstract:
As AI systems become integral to knowledge-intensive work, questions arise not only about their functionality but also their epistemic roles in human-AI interaction. While HCI research has proposed various AI role typologies, it often overlooks how AI reshapes users' roles as knowledge contributors. This study examines how users form epistemic relationships with AI-how they assess, trust, and collaborate with it in research and teaching contexts. Based on 31 interviews with academics across disciplines, we developed a five-part codebook and identified five relationship types: Instrumental Reliance, Contingent Delegation, Co-agency Collaboration, Authority Displacement, and Epistemic Abstention. These reflect variations in trust, assessment modes, tasks, and human epistemic status. Our findings show that epistemic roles are dynamic and context-dependent. We argue for shifting beyond static metaphors of AI toward a more nuanced framework that captures how humans and AI co-construct knowledge, enriching HCI's understanding of the relational and normative dimensions of AI use.
Authors:Ariya Mukherjee-Gandhi, Oliver Muellerklein
Abstract:
As generative AI continues to reshape artistic production and alternate modes of human expression, artists whose livelihoods are most directly affected have raised urgent concerns about consent, transparency, and the future of creative labor. However, the voices of artists are often marginalized in dominant public and scholarly discourse. This study presents a twelve-year analysis, from 2013 to 2025, of English-language discourse surrounding AI-generated art. It draws from 439 curated 500-word excerpts sampled from opinion articles, news reports, blogs, legal filings, and spoken-word transcripts. Through a reproducible methodology, we identify five stable thematic clusters and uncover a misalignment between artists' perceptions and prevailing media narratives. Our findings highlight how the use of technical jargon can function as a subtle form of gatekeeping, often sidelining the very issues artists deem most urgent. Our work provides a BERTopic-based methodology and a multimodal baseline for future research, alongside a clear call for deeper, transparency-driven engagement with artist perspectives in the evolving AI-creative landscape.
Authors:Wayupuk Sommuang, Kun Kerdthaisong, Pasin Buakhaw, Aslan B. Wong, Nutchanon Yongsatianchot
Abstract:
Students' mental well-being is vital for academic success, with activities such as studying, socializing, and sleeping playing a role. Current mobile sensing data highlight this intricate link using statistical and machine learning analyses. We propose a novel LLM agent-based simulation framework to model student activities and mental health using the StudentLife Dataset. Each LLM agent was initialized with personality questionnaires and guided by smartphone sensing data throughout the simulated semester. These agents predict individual behaviors, provide self-reported mental health data via ecological momentary assessments (EMAs), and complete follow-up personality questionnaires. To ensure accuracy, we investigated various prompting techniques, memory systems, and activity-based mental state management strategies that dynamically update an agent's mental state based on their daily activities. This simulation goes beyond simply replicating existing data. This allows us to explore new scenarios that are not present in the original dataset, such as peer influence through agent-to-agent interactions and the impact of social media. Furthermore, we can conduct intervention studies by manipulating activity patterns via sensing signals and personality traits using questionnaire responses. This provides valuable insights into the behavioral changes that could enhance student well-being. The framework also facilitates hypothetical interviews with LLM agents, offering deeper insights into their mental health. This study showcases the power of LLM-driven behavioral modeling with sensing data, opening new avenues for understanding and supporting student mental health.
Authors:Andrew McNutt, Shiyi He, Sujit Kumar Kamaraj, Purbid Bambroo, Nastaran Jadidi, John Bovard, Chang Han
Abstract:
Critical Visualization is gaining popularity and academic focus, yet relatively few academic courses have been offered to support students in this complex area. This experience report describes a recent experimental course on the topic, exploring both what the topic could be as well as an experimental content structure (namely as scavenger hunt). Generally the course was successful, achieving the learning objectives of developing critical thinking skills, improving communication about complex ideas, and developing a knowledge about theories in the area. While improvements can be made, we hope that humanistic notions of criticality are embraced more deeply in visualization pedagogy.
Authors:MatouÅ¡ JelÃnek, Nadine Schlicker, Ewart de Visser
Abstract:
Calibrated trust in automated systems (Lee and See 2004) is critical for their safe and seamless integration into society. Users should only rely on a system recommendation when it is actually correct and reject it when it is factually wrong. One requirement to achieve this goal is an accurate trustworthiness assessment, ensuring that the user's perception of the system's trustworthiness aligns with its actual trustworthiness, allowing users to make informed decisions about the extent to which they can rely on the system (Schlicker et al. 2022). We propose six design guidelines to help designers optimize for accurate trustworthiness assessments, thus fostering ethical and responsible human-automation interactions. The proposed guidelines are derived from existing literature in various fields, such as human-computer interaction, cognitive psychology, automation research, user-experience design, and ethics. We are incorporating key principles from the field of pragmatics, specifically the cultivation of common ground (H. H. Clark 1996) and Gricean communication maxims (Grice 1975). These principles are essential for the design of automated systems because the user's perception of the system's trustworthiness is shaped by both environmental contexts, such as organizational culture or societal norms, and by situational context, including the specific circumstances or scenarios in which the interaction occurs (Hoff and Bashir 2015). Our proposed guidelines provide actionable insights for designers to create automated systems that make relevant trustworthiness cues available. This would ideally foster calibrated trust and more satisfactory, productive, and safe interactions between humans and automated systems. Furthermore, the proposed heuristics might work as a tool for evaluating to what extent existing systems enable users to accurately assess a system's trustworthiness.
Authors:Arjun Kumar, Noppanat Wadlom, Jaeheon Kwak, Si-Hyuck Kang, Insik Shin
Abstract:
Arrhythmia is a common cardiac condition that can precipitate severe complications without timely intervention. While continuous monitoring is essential for timely diagnosis, conventional approaches such as electrocardiogram and wearable devices are constrained by their reliance on specialized medical expertise and patient discomfort from their contact nature. Existing contactless monitoring, primarily designed for healthy subjects, face significant challenges when analyzing reflected signals from arrhythmia patients due to disrupted spatial stability and temporal consistency.
In this paper, we introduce mCardiacDx, a radar-driven contactless system that accurately analyzes reflected signals and reconstructs heart pulse waveforms for arrhythmia monitoring and diagnosis. The key contributions of our work include a novel precise target localization (PTL) technique that locates reflected signals despite spatial disruptions, and an encoder-decoder model that transforms these signals into HPWs, addressing temporal inconsistencies. Our evaluation on a large dataset of healthy subjects and arrhythmia patients shows that both mCardiacDx and PTL outperform state-of-the-art approach in arrhythmia monitoring and diagnosis, also demonstrating improved performance in healthy subjects.
Authors:Hyeok Kim, Jeffrey Heer
Abstract:
Visualization knowledge bases enable computational reasoning and recommendation over a visualization design space. These systems evaluate design trade-offs using numeric weights assigned to different features (e.g., binning a variable). Feature weights can be learned automatically by fitting a model to a collection of chart pairs, in which one chart is deemed preferable to the other. To date, labeled chart pairs have been drawn from published empirical research results; however, such pairs are not comprehensive, resulting in a training corpus that lacks many design variants and fails to systematically assess potential trade-offs. To improve knowledge base coverage and accuracy, we contribute data augmentation techniques for generating and labeling chart pairs. We present methods to generate novel chart pairs based on design permutations and by identifying under-assessed features -- leading to an expanded corpus with thousands of new chart pairs, now in need of labels. Accordingly, we next compare varied methods to scale labeling efforts to annotate chart pairs, in order to learn updated feature weights. We evaluate our methods in the context of the Draco knowledge base, demonstrating improvements to both feature coverage and chart recommendation performance.
Authors:Zhuangze Hou, Jingze Tian, Nianlong Li, Farong Ren, Can Liu
Abstract:
Mixed reality platforms allow users to create virtual environments, yet novice users struggle with both ideation and execution in spatial design. While existing AI models can automatically generate scenes based on user prompts, the lack of interactive control limits users' ability to iteratively steer the output. In this paper, we present EchoLadder, a novel human-AI collaboration pipeline that leverages large vision-language model (LVLM) to support interactive scene modification in virtual reality. EchoLadder accepts users' verbal instructions at varied levels of abstraction and spatial specificity, generates concrete design suggestions throughout a progressive design process. The suggestions can be automatically applied, regenerated and retracted by users' toggle control.Our ablation study showed effectiveness of our pipeline components. Our user study found that, compared to baseline without showing suggestions, EchoLadder better supports user creativity in spatial design. It also contributes insights on users' progressive design strategies under AI assistance, providing design implications for future systems.
Authors:Mansi Sharma, Antonio Kruger
Abstract:
Robot-assisted surgery has revolutionized the healthcare industry by providing surgeons with greater precision, reducing invasiveness, and improving patient outcomes. However, the success of these surgeries depends heavily on the robotic system ability to accurately interpret the intentions of the surgical trainee or even surgeons. One critical factor impacting intent recognition is the cognitive workload experienced during the procedure. In our recent research project, we are building an intelligent adaptive system to monitor cognitive workload and improve learning outcomes in robot-assisted surgery. The project will focus on achieving a semantic understanding of surgeon intents and monitoring their mental state through an intelligent multi-modal assistive framework. This system will utilize brain activity, heart rate, muscle activity, and eye tracking to enhance intent recognition, even in mentally demanding situations. By improving the robotic system ability to interpret the surgeons intentions, we can further enhance the benefits of robot-assisted surgery and improve surgery outcomes.
Authors:Shiyao Zhang, Omar Faruk, Robert Porzel, Dennis Küster, Tanja Schultz, Hui Liu
Abstract:
An increasing number of online interaction settings now provide the possibility to visually represent oneself via an animated avatar instead of a video stream. Benefits include protecting the communicator's privacy while still providing a means to express their individuality. In consequence, there has been a surge in means for avatar-based personalization, ranging from classic human representations to animals, food items, and more. However, using avatars also has drawbacks. Depending on the human-likeness of the avatar and the corresponding disparities between the avatar and the original expresser, avatars may elicit discomfort or even hinder effective nonverbal communication by distorting emotion perception. This study examines the relationship between the human-likeness of virtual avatars and emotion perception for Ekman's six "basic emotions". Research reveals that avatars with varying degrees of human-likeness have distinct effects on emotion perception. High human-likeness avatars, such as human avatars, tend to elicit more negative emotional responses from users, a phenomenon that is consistent with the concept of Uncanny Valley in aesthetics, which suggests that closely resembling humans can provoke negative emotional responses. Conversely, a raccoon avatar and a shark avatar, known as cuteness, which exhibit moderate human similarity in this study, demonstrate a positive influence on emotion perception. Our initial results suggest that the human-likeness of avatars is an important factor for emotion perception. The results from the follow-up study further suggest that the cuteness of avatars and their natural facial status may also play a significant role in emotion perception and elicitation. We discuss practical implications for strategically conveying specific human behavioral messages through avatars in multiple applications, such as business and counseling.
Authors:Jing Tang, Qing Xiao, Kunxu Du, Zaiqiao Ye
Abstract:
We present RoboLinker, a generative design system that creates matching outfits for humans and their robots. Using a diffusion-based model, the system takes a robot image and a style prompt from users as input, and outputs a human outfit that visually complements the robot's attire. Through an interactive interface, users can refine the generated designs. We evaluate RoboLinker with both humanoid and pet-like robots, demonstrating its capacity to produce stylistically coherent and emotionally resonant results.
Authors:Nadja R. Ging-Jehli, Russell K. Childers, Joshua Lu, Robert Gemma, Rachel Zhu
Abstract:
How do we learn when to persist, when to let go, and when to shift gears? Gearshift Fellowship (GF) is the prototype of a new Supertask paradigm designed to model how humans and artificial agents adapt to shifting environment demands. Grounded in cognitive neuroscience, computational psychiatry, economics, and artificial intelligence, Supertasks combine computational neurocognitive modeling with serious gaming. This creates a dynamic, multi-mission environment engineered to assess mechanisms of adaptive behavior across cognitive and social contexts. Computational parameters explain behavior and probe mechanisms by controlling the game environment. Unlike traditional tasks, GF enables neurocognitive modeling of individual differences across perceptual decisions, learning, and meta-cognitive levels. This positions GF as a flexible testbed for understanding how cognitive-affective control processes, learning styles, strategy use, and motivational shifts adapt across contexts and over time. It serves as an experimental platform for scientists, a phenotype-to-mechanism intervention for clinicians, and a training tool for players aiming to strengthen self-regulated learning, mood, and stress resilience. Online study (n = 60, ongoing) results show that GF recovers effects from traditional neuropsychological tasks (construct validity), uncovers novel patterns in how learning differs across contexts and how clinical features map onto distinct adaptations. These findings pave the way for developing in-game interventions that foster self-efficacy and agency to cope with real-world stress and uncertainty. GF builds a new adaptive ecosystem designed to accelerate science, transform clinical care, and foster individual growth. It offers a mirror and training ground where humans and machines co-develop together deeper flexibility and awareness.
Authors:Agniva Banerjee, Bhanu Partap Paregi, Haroon R. Lone
Abstract:
Monitoring sleep posture and behavior is critical for diagnosing sleep disorders and improving overall sleep quality. However, traditional approaches, such as wearable devices, cameras, and pressure sensors, often compromise user comfort, fail under obstructions like blankets, and raise privacy concerns. To overcome these limitations, we present RestAware, a non-invasive, contactless sleep monitoring system based on a 24GHz frequency-modulated continuous wave (FMCW) radar. Our system is evaluated on 25 participants across eight common sleep postures, achieving 92% classification accuracy and an F1-score of 0.91 using a K-Nearest Neighbors (KNN) classifier. In addition, we integrate instruction-tuned large language models (Mistral, Llama, and Falcon) to generate personalized, human-readable sleep summaries from radar-derived posture data. This low-cost ($ 35), privacy-preserving solution offers a practical alternative for real-time deployment in smart homes and clinical environments.
Authors:Sofia Sahab, Jawad Haqbeen, Diksha Sapkota, Takayuki Ito
Abstract:
In this study, we investigated the effects of GPT-4, with and without specific conversational instructions, on the mental health of Afghan women. These women face multifaceted challenges, including Taliban-imposed restrictions, societal inequalities, and domestic violence, adversely affecting their well-being. We conducted a randomized controlled trial with 60 participants, dividing them into three groups: GPT-4, a supportive listener (GPT-4 with empathetic engagement instructions), and a waiting list. The Hospital Anxiety and Depression Scale (HADS) was used to measure anxiety and depression before and after the intervention. Linguistic analysis of chat data examined personal pronouns, tones, emotions, and Language Style Matching (LSM). The supportive listener group showed a significant reduction in HADS scores compared to the other groups. Linguistic analysis revealed a more positive tone and higher LSM in the supportive listener group, with a significant negative correlation between LSM and changes in HADS scores, indicating greater linguistic alignment was linked to reductions in anxiety and depression. Perceived empathy ratings were also significantly higher in the supportive listener group. These findings highlight the potential of AI-driven interventions, like GPT-4, in providing accessible mental health support. However, such interventions should complement traditional psychotherapy, ensuring a collaborative approach to optimize therapeutic outcomes.
Authors:Banan Alkhateeb, Ellis Solaiman
Abstract:
Social media platforms today strive to improve user experience through AI recommendations, yet the value of such recommendations vanishes as users do not understand the reasons behind them. This issue arises because explainability in social media is general and lacks alignment with user-specific needs. In this vision paper, we outline a user-segmented and context-aware explanation layer by proposing a visual explanation system with diverse explanation methods. The proposed system is framed by the variety of user needs and contexts, showing explanations in different visualized forms, including a technically detailed version for AI experts and a simplified one for lay users. Our framework is the first to jointly adapt explanation style (visual vs. numeric) and granularity (expert vs. lay) inside a single pipeline. A public pilot with 30 X users will validate its impact on decision-making and trust.
Authors:Maryam Mosleh, Marie Devlin, Ellis Solaiman
Abstract:
Artificial intelligence-driven adaptive learning systems are reshaping education through data-driven adaptation of learning experiences. Yet many of these systems lack transparency, offering limited insight into how decisions are made. Most explainable AI (XAI) techniques focus on technical outputs but neglect user roles and comprehension. This paper proposes a hybrid framework that integrates traditional XAI techniques with generative AI models and user personalisation to generate multimodal, personalised explanations tailored to user needs. We redefine explainability as a dynamic communication process tailored to user roles and learning goals. We outline the framework's design, key XAI limitations in education, and research directions on accuracy, fairness, and personalisation. Our aim is to move towards explainable AI that enhances transparency while supporting user-centred experiences.
Authors:Jacqueline Elise Bruen, Myounghoon Jeon
Abstract:
With the development of generative artificial intelligence (GenAI) tools to create art, stakeholders cannot come to an agreement on the value of these works. In this study we uncovered the mixed opinions surrounding art made by AI. We developed two versions of a dance performance augmented by technology either with or without GenAI. For each version we informed audiences of the performance's development either before or after a survey on their perceptions of the performance. There were thirty-nine participants (13 males, 26 female) divided between the four performances. Results demonstrated that individuals were more inclined to attribute artistic merit to works made by GenAI when they were unaware of its use. We present this case study as a call to address the importance of utilizing the social context and the users' interpretations of GenAI in shaping a technical explanation, leading to a greater discussion that can bridge gaps in understanding.
Authors:Brian Houck, Travis Lowdermilk, Cody Beyer, Steven Clarke, Ben Hanrahan
Abstract:
As artificial intelligence (AI) tools become increasingly embedded in software development workflows, questions persist about their true impact on developer productivity and experience. This paper presents findings from a mixed-methods study examining how developers perceive AI's influence across the dimensions of the SPACE framework: Satisfaction, Performance, Activity, Collaboration and Efficiency. Drawing on survey responses from over 500 developers and qualitative insights from interviews and observational studies, we find that AI is broadly adopted and widely seen as enhancing productivity, particularly for routine tasks. However, the benefits vary, depending on task complexity, individual usage patterns, and team-level adoption. Developers report increased efficiency and satisfaction, with less evidence of impact on collaboration. Organizational support and peer learning play key roles in maximizing AI's value. These findings suggest that AI is augmenting developers rather than replacing them, and that effective integration depends as much on team culture and support structures as on the tools themselves. We conclude with practical recommendations for teams, organizations and researchers seeking to harness AI's potential in software engineering.
Authors:Ziqing Xu, Nick Bryan-Kinns
Abstract:
Many existing AI music generation tools rely on text prompts, complex interfaces, or instrument-like controls, which may require musical or technical knowledge that non-musicians do not possess. This paper introduces DeformTune, a prototype system that combines a tactile deformable interface with the MeasureVAE model to explore more intuitive, embodied, and explainable AI interaction. We conducted a preliminary study with 11 adult participants without formal musical training to investigate their experience with AI-assisted music creation. Thematic analysis of their feedback revealed recurring challenge--including unclear control mappings, limited expressive range, and the need for guidance throughout use. We discuss several design opportunities for enhancing explainability of AI, including multimodal feedback and progressive interaction support. These findings contribute early insights toward making AI music systems more explainable and empowering for novice users.
Authors:Sophia Liu, Shm Garanganao Almeda
Abstract:
Today's algorithm-driven interfaces, from recommendation feeds to GenAI tools, often prioritize engagement and efficiency at the expense of user agency. As systems take on more decision-making, users have less control over what they see and how meaning or relationships between content are constructed. This paper introduces "Hypertextual Friction," a conceptual design stance that repositions classical hypertext principles--friction, traceability, and structure--as actionable values for reclaiming agency in algorithmically mediated environments. Through a comparative analysis of real-world interfaces--Wikipedia vs. Instagram Explore, and Are.na vs. GenAI image tools--we examine how different systems structure user experience, navigation, and authorship. We show that hypertext systems emphasize provenance, associative thinking, and user-driven meaning-making, while algorithmic systems tend to obscure process and flatten participation. We contribute: (1) a comparative analysis of how interface structures shape agency in user-driven versus agent-driven systems, and (2) a conceptual stance that offers hypertextual values as design commitments for reclaiming agency in an increasingly algorithmic web.
Authors:Sebastian Gürtl, Gloria Schimetta, David Kerschbaumer, Michael Liut, Alexander Steinmaurer
Abstract:
UML and ER diagrams are foundational in computer science education but come with challenges for learners due to the need for abstract thinking, contextual understanding, and mastery of both syntax and semantics. These complexities are difficult to address through traditional teaching methods, which often struggle to provide scalable, personalized feedback, especially in large classes. We introduce DUET (Diagrammatic UML & ER Tutor), a prototype of an LLM-based tool, which converts a reference diagram and a student-submitted diagram into a textual representation and provides structured feedback based on the differences. It uses a multi-stage LLM pipeline to compare diagrams and generate reflective feedback. Furthermore, the tool enables analytical insights for educators, aiming to foster self-directed learning and inform instructional strategies. We evaluated DUET through semi-structured interviews with six participants, including two educators and four teaching assistants. They identified strengths such as accessibility, scalability, and learning support alongside limitations, including reliability and potential misuse. Participants also suggested potential improvements, such as bulk upload functionality and interactive clarification features. DUET presents a promising direction for integrating LLMs into modeling education and offers a foundation for future classroom integration and empirical evaluation.
Authors:Jorge Ruiz Gómez, Lidia Andrés Susinos, Jorge Alamo Olivé, Sonia Rey Osorno, Manuel Luis Gonzalez Hernández
Abstract:
This paper presents the design, implementation, and evaluation behind a Large Language Model (LLM) agent that chats with an industrial production-grade ERP system. The agent is capable of interpreting natural language queries and translating them into executable SQL statements, leveraging open-weight LLMs. A novel dual-agent architecture combining reasoning and critique stages was proposed to improve query generation reliability.
Authors:Junyong Park, Saelyne Yang, Sungho Jo
Abstract:
Wearable technology has transformed sports analytics, offering new dimensions in enhancing player experience. Yet, many solutions involve cumbersome setups that inhibit natural motion. In tennis, existing products require sensors on the racket or dominant arm, causing distractions and discomfort. We propose Silent Impact, a novel and user-friendly system that analyzes tennis shots using a sensor placed on the passive arm. Collecting Inertial Measurement Unit sensor data from 20 recreational tennis players, we developed neural networks that exclusively utilize passive arm data to detect and classify six shots, achieving a classification accuracy of 88.2% and a detection F1 score of 86.0%, comparable to the dominant arm. These models were then incorporated into an end-to-end prototype, which records passive arm motion through a smartwatch and displays a summary of shots on a mobile app. User study (N=10) showed that participants felt less burdened physically and mentally using Silent Impact on the passive arm. Overall, our research establishes the passive arm as an effective, comfortable alternative for tennis shot analysis, advancing user-friendly sports analytics.
Authors:Hashim Hayat, Maksim Kudrautsau, Evgeniy Makarov, Vlad Melnichenko, Tim Tsykunou, Piotr Varaksin, Matt Pavelle, Adam Z. Oskowitz
Abstract:
Background: Globally we face a projected shortage of 11 million healthcare practitioners by 2030, and administrative burden consumes 50% of clinical time. Artificial intelligence (AI) has the potential to help alleviate these problems. However, no end-to-end autonomous large language model (LLM)-based AI system has been rigorously evaluated in real-world clinical practice. In this study, we evaluated whether a multi-agent LLM-based AI framework can function autonomously as an AI doctor in a virtual urgent care setting. Methods: We retrospectively compared the performance of the multi-agent AI system Doctronic and board-certified clinicians across 500 consecutive urgent-care telehealth encounters. The primary end points: diagnostic concordance, treatment plan consistency, and safety metrics, were assessed by blinded LLM-based adjudication and expert human review. Results: The top diagnosis of Doctronic and clinician matched in 81% of cases, and the treatment plan aligned in 99.2% of cases. No clinical hallucinations occurred (e.g., diagnosis or treatment not supported by clinical findings). In an expert review of discordant cases, AI performance was superior in 36.1%, and human performance was superior in 9.3%; the diagnoses were equivalent in the remaining cases. Conclusions: In this first large-scale validation of an autonomous AI doctor, we demonstrated strong diagnostic and treatment plan concordance with human clinicians, with AI performance matching and in some cases exceeding that of practicing clinicians. These findings indicate that multi-agent AI systems achieve comparable clinical decision-making to human providers and offer a potential solution to healthcare workforce shortages.
Authors:Ivan A. Hanono Cozzetti, Ahmad Abdou
Abstract:
The analysis of spatio-temporal data presents significant challenges due to the complexity and heterogeneity of movement patterns. This project proposes a data analytics tool that combines data visualization and statistical computation to facilitate spatio-temporal data analysis through a multi-level approach. The tool categorizes moving objects into distinct taxonomies using Machine Learning models, adding meaningful structure to the analysis. Two case studies demonstrate the methodology's effectiveness. The first analyzed Arctic fox trajectories, successfully identifying and labeling foxes with Geometric or Kinematic-based behaviors, further categorized into Curvature and Acceleration groups. Statistical indicators revealed that foxes with Acceleration-based behavior showed constant, steady acceleration, while those with Curvature-based behavior exhibited acceleration peaks and sudden deceleration. The second case study examined tropical cyclone data, labeling trajectories with Speed, Curvature, and hybrid Geometric-based behaviors through unique statistical variables. Analysis of hybrid Geometric behavior (Curvature and Indentation combined) identified specific angles with the highest impact on hurricane shape and geometry. The proposed method and tool demonstrate that spatio-temporal data, despite inherent complexity, can be analyzed and explained in detail, providing a theoretical and practical blueprint applicable to multiple domains.
Authors:Ismail Hossain, Mridul Banik
Abstract:
Conventional augmentative and alternative communication (AAC) systems and language-learning platforms often fail to adapt in real time to the user's cognitive and linguistic needs, especially in neurological conditions such as post-stroke aphasia or amyotrophic lateral sclerosis. Recent advances in noninvasive electroencephalography (EEG)--based brain-computer interfaces (BCIs) and transformer--based large language models (LLMs) offer complementary strengths: BCIs capture users' neural intent with low fatigue, while LLMs generate contextually tailored language content. We propose and evaluate a novel hybrid framework that leverages real-time EEG signals to drive an LLM-powered language rehabilitation assistant. This system aims to: (1) enable users with severe speech or motor impairments to navigate language-learning modules via mental commands; (2) dynamically personalize vocabulary, sentence-construction exercises, and corrective feedback; and (3) monitor neural markers of cognitive effort to adjust task difficulty on the fly.
Authors:Jérôme Ferrari, Benoit Delinchant, Frédéric Wurtz, Olga Rouchouze
Abstract:
As part of the energy transition and the rise in energy prices, the number of collective self-consumption operations in France is steadily increasing. However, energy flow monitoring currently relies on historical ''day+1'' data provided by Linky meters, which does not offer real time feedback to help participants adapt their energy consumption behaviors. This article introduces a new open-source infrastructure for real-time monitoring based on Linky meter data, enabling participants to make informed decisions and take timely actions. It includes a description of the xKy device, applied to a collective self-consumption operation involving nine participants, supported by the Energy Transition Observatory (OTE). The project encompasses the implementation of gateways in participants' homes and the development and operation of real-time monitoring website, aimed at increasing participants' self-consumption rate.
Authors:Javier Jimenez-Honrado, Javier Gomez Garcia, Felipe Costa-Tebar, Felix A. Marco, Jose A. Gallud, Gabriel Sebastian Rivera
Abstract:
In spite of all advances promoted by information technologies, there are still activities where this technology is not applied for reasons such as being carried out in non-profit organizations or because they have not adapted to this modernization. Until recently, the way to work with mobile devices was either by connecting through a web page with the device's browser, or by downloading an application from the corresponding platform. But lately, technologies are being developed that aim to break with this, as in the case of Progressive Web Applications (PWA). One of the advantages offered by PWA is to access the web page and install it as an application on the device. The purpose of this article is to design a progressive Web application for the support of Storytelling Therapy, one of the novel therapies applied in the field of mental health. In addition to providing a software application to enhance Storytelling Therapy workshops, it is also intended to analyze and verify the advantages of PWA in a real case.
Authors:A. E. Fuentes-Cortázar, A. Rivera-Hernández, J. R. Rojano-Cáceres
Abstract:
User Experience (UX) evaluation methods that are commonly used with hearing users may not be functional or effective for Deaf users. This is because these methods are primarily designed for users with hearing abilities, which can create limitations in the interaction, perception, and understanding of the methods for Deaf individuals. Furthermore, traditional UX evaluation approaches often fail to address the unique accessibility needs of Deaf users, resulting in an incomplete or biased assessment of their user experience. This research focused on analyzing a set of UX evaluation methods recommended for use with Deaf users, with the aim of validating the accessibility of each method through findings and limitations. The results indicate that, although these evaluation methods presented here are commonly recommended in the literature for use with Deaf users, they present various limitations that must be addressed in order to better adapt to the communication skills specific to the Deaf community. This research concludes that evaluation methods must be adapted to ensure accessible software evaluation for Deaf individuals, enabling the collection of data that accurately reflects their experiences and needs.
Authors:Adel Sabour, Ahmed Gadallah, Hesham Hefny
Abstract:
This paper presents a two-dimension fuzzy set based approach for matching touch-based gestures using fuzzy cued click point technique. The pro posed approach aims mainly to improve the acceptance of the most closed inac curate hand drawn gestures generated by the user compared with a predefined referenced gesture value that is stored in the user profile. Commonly, gestures are used in order to facilitate the interactive capabilities between humans and computerized systems. Unfortunately, most of current gesturing techniques don't deal at the same level of inaccuracy of gesturing, resulted from the nature of hu man fingers and hands movements. This paper aims, in a more flexible manner, to tackle the inaccuracy problem existed with gesture-based interactions between humans and a computerized system.
Authors:ZhaoBin Li, Mark Steyvers
Abstract:
In settings where human decision-making relies on AI input, both the predictive accuracy of the AI system and the reliability of its confidence estimates influence decision quality. We highlight the role of AI metacognitive sensitivity -- its ability to assign confidence scores that accurately distinguish correct from incorrect predictions -- and introduce a theoretical framework for assessing the joint impact of AI's predictive accuracy and metacognitive sensitivity in hybrid decision-making settings. Our analysis identifies conditions under which an AI with lower predictive accuracy but higher metacognitive sensitivity can enhance the overall accuracy of human decision making. Finally, a behavioral experiment confirms that greater AI metacognitive sensitivity improves human decision performance. Together, these findings underscore the importance of evaluating AI assistance not only by accuracy but also by metacognitive sensitivity, and of optimizing both to achieve superior decision outcomes.
Authors:Erin Gatz, Yasmine Kotturi, Andrea Afua Kwamya, Sarah Fox
Abstract:
Feminist makerspaces offer community led alternatives to dominant tech cultures by centering care, mutual aid, and collective knowledge production. While prior CSCW research has explored their inclusive practices, less is known about how these spaces sustain themselves over time. Drawing on interviews with 18 founders and members across 8 U.S. feminist makerspaces as well as autoethnographic reflection, we examine the organizational and relational practices that support long-term endurance. We find that sustainability is not achieved through growth or institutionalization, but through care-driven stewardship, solidarity with local justice movements, and shared governance. These social practices position feminist makerspaces as prefigurative counterspaces - sites that enact, rather than defer, feminist values in everyday practice. This paper offers empirical insight into how feminist makerspaces persist amid structural precarity, and highlights the forms of labor and coalition-building that underpin alternative sociotechnical infrastructures.
Authors:Owen Hoffman, Kangze Peng, Zehua You, Sajid Kamal, Sukrit Venkatagiri
Abstract:
Generative AI, including large language models (LLMs) have the potential -- and already are being used -- to increase the speed, scale, and types of unsafe conversations online. LLMs lower the barrier for entry for bad actors to create unsafe conversations in particular because of their ability to generate persuasive and human-like text. In our current work, we explore ways to promote online safety by teaching people about unsafe conversations that can occur online with and without LLMs. We build on prior work that shows that LLMs can successfully simulate scam conversations. We also leverage research in the learning sciences that shows that providing feedback on one's hypothetical actions can promote learning. In particular, we focus on simulating scam conversations using LLMs. Our work incorporates two LLMs that converse with each other to simulate realistic, unsafe conversations that people may encounter online between a scammer LLM and a target LLM but users of our system are asked provide feedback to the target LLM.
Authors:Black Sun, Die, Hu
Abstract:
Remote fetal monitoring technologies are becoming increasingly common. Yet, most current systems offer limited interpretability, leaving expectant parents with raw cardiotocography (CTG) data that is difficult to understand. In this work, we present CTG-Insight, a multi-agent LLM system that provides structured interpretations of fetal heart rate (FHR) and uterine contraction (UC) signals. Drawing from established medical guidelines, CTG-Insight decomposes each CTG trace into five medically defined features: baseline, variability, accelerations, decelerations, and sinusoidal pattern, each analyzed by a dedicated agent. A final aggregation agent synthesizes the outputs to deliver a holistic classification of fetal health, accompanied by a natural language explanation. We evaluate CTG-Insight on the NeuroFetalNet Dataset and compare it against deep learning models and the single-agent LLM baseline. Results show that CTG-Insight achieves state-of-the-art accuracy (96.4%) and F1-score (97.8%) while producing transparent and interpretable outputs. This work contributes an interpretable and extensible CTG analysis framework.
Authors:Christian Meske, Tobias Hermanns, Esther von der Weiden, Kai-Uwe Loser, Thorsten Berger
Abstract:
Software development is undergoing a fundamental transformation as vibe coding becomes widespread, with large portions of contemporary codebases now being AI-generated. The disconnect between rapid adoption and limited conceptual understanding highlights the need for an inquiry into this emerging paradigm. Drawing on an intent perspective and historical analysis, we define vibe coding as a software development paradigm where humans and generative AI engage in collaborative flow to co-create software artifacts through natural language dialogue, shifting the mediation of developer intent from deterministic instruction to probabilistic inference. By intent mediation, we refer to the fundamental process through which developers translate their conceptual goals into representations that computational systems can execute. Our results show that vibe coding reconfigures cognitive work by redistributing epistemic labor between humans and machines, shifting the expertise in the software development process away from traditional areas such as design or technical implementation toward collaborative orchestration. We identify key opportunities, including democratization, acceleration, and systemic leverage, alongside risks, such as black box codebases, responsibility gaps, and ecosystem bias. We conclude with a research agenda spanning human-, technology-, and organization-centered directions to guide future investigations of this paradigm.
Authors:Swaroop Panda, Arun Kumar Sekar
Abstract:
Visualizations are essential tools for disseminating information regarding elections and their outcomes, potentially influencing public perceptions. Personas, delineating distinctive segments within the populace, furnish a valuable framework for comprehending the nuanced perspectives, requisites, and behaviors of diverse voter demographics. In this work, we propose making visualizations tailored to these personas to make election information easier to understand and more relevant. Using data from UK parliamentary elections and new developments in Large Language Models (LLMs), we create personas that encompass the diverse demographics, technological preferences, voting tendencies, and information consumption patterns observed among voters.Subsequently, we elucidate how these personas can inform the design of visualizations through specific design criteria. We then provide illustrative examples of visualization prototypes based on these criteria and evaluate these prototypes using these personas and LLMs. We finally propose some actionable insights based upon the framework and the different design artifacts.
Authors:Mariam Alsayyad, Fayadh Kadhem
Abstract:
This paper addresses the question of how able the current trends of Artificial Intelligence (AI) are in managing to take the responsibility of a full course of mathematics at a college level. The study evaluates this ability in four significant aspects, namely, creating a course syllabus, presenting selected material, answering student questions, and creating an assessment. It shows that even though the AI is strong in some important parts like organization and accuracy, there are still some human aspects that are far away from the current abilities of AI. There is still a hidden emotional part, even in science, that cannot be fulfilled by the AI in its current state. This paper suggests some recommendations to integrate the human and AI potentials to create better outcomes in terms of reaching the target of creating a full course of mathematics, at a university level, as best as possible.
Authors:JIaheng Wang, Yucun Zhong, Chengjie Huang, Lin Yao
Abstract:
Brain-computer interface (BCI) spellers can render a new communication channel independent of peripheral nervous system, which are especially valuable for patients with severe motor disabilities. However, current BCI spellers often require users to type intended utterances letter-by-letter while spelling errors grow proportionally due to inaccurate electroencephalogram (EEG) decoding, largely impeding the efficiency and usability of BCIs in real-world communication. In this paper, we present MindChat, a large language model (LLM)-assisted BCI speller to enhance BCI spelling efficiency by reducing users' manual keystrokes. Building upon prompt engineering, we prompt LLMs (GPT-4o) to continuously suggest context-aware word and sentence completions/predictions during spelling. Online copy-spelling experiments encompassing four dialogue scenarios demonstrate that MindChat saves more than 62\% keystrokes and over 32\% spelling time compared with traditional BCI spellers. We envision high-speed BCI spellers enhanced by LLMs will potentially lead to truly practical applications.
Authors:Nicholas Botti, Flora Haberkorn, Charlotte Hoopes, Shaun Khan
Abstract:
We utilize a within-subjects design with randomized task assignments to understand the effectiveness of using an AI retrieval augmented generation (RAG) tool to assist analysts with an information extraction and data annotation task. We replicate an existing, challenging real-world annotation task with complex multi-part criteria on a set of thousands of pages of public disclosure documents from global systemically important banks (GSIBs) with heterogeneous and incomplete information content. We test two treatment conditions. First, a "naive" AI use condition in which annotators use only the tool and must accept the first answer they are given. And second, an "interactive" AI treatment condition where annotators use the tool interactively, and use their judgement to follow-up with additional information if necessary. Compared to the human-only baseline, the use of the AI tool accelerated task execution by up to a factor of 10 and enhanced task accuracy, particularly in the interactive condition. We find that when extrapolated to the full task, these methods could save up to 268 hours compared to the human-only approach. Additionally, our findings suggest that annotator skill, not just with the subject matter domain, but also with AI tools, is a factor in both the accuracy and speed of task performance.
Authors:Yiling Zhao, Audrey Michal, Nithum Thain, Hari Subramonyam
Abstract:
As AI systems shape individual and societal decisions, fostering critical AI literacy is essential. Traditional approaches, such as blog articles, static lessons, and social media discussions, often fail to support deep conceptual understanding and critical engagement. This study examines whether interactive simulations can help learners think like a scientist by engaging them in hypothesis testing, experimentation, and direct observation of AI behavior. In a controlled study with 605 participants, we assess how interactive AI tutorials impact learning of key concepts such as fairness, dataset representativeness, and bias in language models. Results show that interactive simulations effectively enhance AI literacy across topics, supporting greater knowledge transfer and self-reported confidence, though engagement alone does not predict learning. This work contributes to the growing field of AI literacy education, highlighting how interactive, inquiry-driven methodologies can better equip individuals to critically engage with AI in their daily lives.
Authors:Bibeg Limbu, Irene-Angelica Chounta, Vilma Sukacke, Andromachi Filippidi, Chara Spyropoulou, Marianna Anagnostopoulou, Eleftheria Tsourlidaki, Nikos Karacapilidis
Abstract:
This paper explores the needs and expectations of educational stakeholders for AI (Artificial Intelligence)-enhanced learning environments. Data was collected following two-phased participatory workshops. The first workshop outlined stakeholders' profiles in terms of technical and pedagogical characteristics. The qualitative data collected was analysed using deductive thematic analysis with Activity Theory, explicating the user needs. The second workshop articulated expectations related to the integration of AI in education. Inductive thematic analysis of the second workshop led to the elicitation of users' expectations. We cross-examined the needs and expectations, identifying contradictions, to generate user requirements for emerging technologies. The paper provides suggestions for future design initiatives that incorporate AI in learning environments.
Authors:Joe Hasei, Yosuke Matsumoto, Hiroki Kawai, Yuko Okahisa, Manabu Takaki, Toshifumi Ozaki
Abstract:
This study assessed metaverse-based support groups designed to reduce social isolation and suicide risk among LGBTQ+ youths. Using the Cluster platform, enhanced anonymity, avatar-based self-expression, and accessibility were provided. Key findings showed that 79.2% chose avatars matching their gender identity, reporting high satisfaction (mean: 4.10/5) and low discomfort (mean: 1.79/5). Social confidence significantly improved in virtual spaces compared to real-world interactions (p<0.001), particularly among participants with initially low confidence, averaging an increase of 2.08 points. About half of the first-time participants were 16 or younger, highlighting potential for early intervention. The metaverse scored higher than real-world environments for safety/privacy (3.94/5), self-expression (4.02/5), and accessibility (4.21/5). Additionally, 73.6% reported feeling more accepted virtually. However, some highly confident individuals offline experienced mild adaptation challenges, averaging a confidence decrease of 0.58 points, indicating virtual support complements rather than replaces in-person services. These findings suggest metaverse-based support effectively lowers psychological barriers and provides affirming spaces, potentially reducing severe outcomes such as suicidal ideation. Future studies should focus on integrating virtual support with existing community and clinical frameworks to enhance long-term impacts.
Authors:Carlo A. Furia, Andrea Mocci
Abstract:
Games like Super Mario Maker 2 (SMM2) lower the barrier for casual users to become level designers. In this paper, we set out to analyze a vast amount of data about SMM2 user-written levels, in order to understand what factors affect a level's difficulty as experienced by other users. To this end, we perform two kinds of analyses: one based on regression models and one using natural language processing techniques. The main results shed light on which level characteristics (e.g., its style, popularity, timing) and which topics and sentiments have a consistent association with easier or harder levels. While none of our findings are startling, they help distill some key differences between easy and hard SMM2 levels, which, in turn, can pave the way for a better understanding of end-user level design.
Authors:Tianyi Li, Tanay Maheshwari, Alex Voelker
Abstract:
We present a case study of using generative user interfaces, or ``vibe coding,'' a method leveraging large language models (LLMs) for generating code via natural language prompts, to support rapid prototyping in user-centered design (UCD). Extending traditional UCD practices, we propose an AI-in-the-loop ideate-prototyping process. We share insights from an empirical experience integrating this process to develop an interactive data analytics interface for highway traffic engineers to effectively retrieve and analyze historical traffic data. With generative UIs, the team was able to elicit rich user feedback and test multiple alternative design ideas from user evaluation interviews and real-time collaborative sessions with domain experts. We discuss the advantages and pitfalls of vibe coding for bridging the gaps between design expertise and domain-specific expertise.
Authors:Narjes Pourjafarian, Zhenming Yang, Jeffrey I. Lipton, Benyamin Davaji, Gregory D. Abowd
Abstract:
Electronic waste (e-waste) is a growing global challenge, with millions of functional components discarded due to the difficulty of repair and reuse. Traditional circuit assembly relies on soldering, which creates semi-permanent bonds that limit component recovery and contribute to unnecessary waste. We introduce ProForm, a thermoforming approach for solder-free circuit prototyping. By encapsulating electronic components with pressure-formed thermoplastics, ProForm enables secure, reversible mounting without the need for solder or custom mechanical housings. This approach supports a wide range of substrates, including flexible, paper-based, and non-planar circuits, facilitating easy reuse, replacement, and rapid prototyping. We demonstrate ProForm's versatility to support prototyping practices. We show that ProFormed circuits exhibit good electrical performance and mechanical stability. While motivated by a need for sustainable electronics practices, ProForm has other significant advantages over traditional soldering.
Authors:Fabian Rücker, Torben Storch
Abstract:
Text input in extended reality (XR) applications remains inefficient and tedious. Most solutions are derived from the traditional keyboard layout, yet fail to translate its positive characteristics to the spatial digital realm. This limits the productive use of immersive technologies. In this work, we analyze physical keyboard input to identify key characteristics that facilitate its comfort, touch typing and high typing speeds. Building on these findings, we propose a novel pressure-based text input modality that transfers these characteristics into immersive space by substituting the two-dimensional QWERTY layout with a linear scale. This design facilitates a touch-typing-like experience, eliminating the need for visual guidance for proficient users. Our skill-based approach enables typing speeds of over 200 characters per minute. Additionally, it is suitable for discreet use in public spaces and everyday text-input tasks, since the proposed system requires virtually no hand or finger movements and resembles smartphone-based text input in appearance.
Authors:Geng-Xin Xu, Xiang Zuo, Ye Li
Abstract:
Emotion recognition from physiological data is crucial for mental health assessment, yet it faces two significant challenges: incomplete multi-modal signals and interference from body movements and artifacts. This paper presents a novel Multi-Masked Querying Network (MMQ-Net) to address these issues by integrating multiple querying mechanisms into a unified framework. Specifically, it uses modality queries to reconstruct missing data from incomplete signals, category queries to focus on emotional state features, and interference queries to separate relevant information from noise. Extensive experiment results demonstrate the superior emotion recognition performance of MMQ-Net compared to existing approaches, particularly under high levels of data incompleteness.
Authors:Tiffany Tseng, Katelyn Lam, Tiffany Lin Fu, Alekhya Maram
Abstract:
Multimodal Large Language Models (MLLMs) are beginning to empower new user experiences that can flexibly generate content from a range of inputs, including images, text, speech, and video. These capabilities have the potential to enrich learning by enabling users to capture and interact with information using a variety of modalities, but little is known about how educators envision how MLLMs might shape the future of learning experiences, what challenges diverse teachers encounter when interpreting how these models work, and what practical needs should be considered for successful implementation in educational contexts. We investigated educator perspectives through formative workshops with 12 K-12 educators, where participants brainstormed learning opportunities, discussed practical concerns for effective use, and prototyped their own MLLM-powered learning applications using Claude 3.5 and its Artifacts feature for previewing code-based output. We use case studies to illustrate two contrasting end-user approaches (teacher-and student-driven), and share insights about opportunities and concerns expressed by our participants, ending with implications for leveraging MLLMs for future learning experiences.
Authors:Khloud AL Jallad, Nada Ghneim, Ghaida Rebdawi
Abstract:
Natural Language Understanding (NLU) is a basic task in Natural Language Processing (NLP). The evaluation of NLU capabilities has become a trending research topic that attracts researchers in the last few years, resulting in the development of numerous benchmarks. These benchmarks include various tasks and datasets in order to evaluate the results of pretrained models via public leaderboards. Notably, several benchmarks contain diagnostics datasets designed for investigation and fine-grained error analysis across a wide range of linguistic phenomena. This survey provides a comprehensive review of available English, Arabic, and Multilingual NLU benchmarks, with a particular emphasis on their diagnostics datasets and the linguistic phenomena they covered. We present a detailed comparison and analysis of these benchmarks, highlighting their strengths and limitations in evaluating NLU tasks and providing in-depth error analysis. When highlighting the gaps in the state-of-the-art, we noted that there is no naming convention for macro and micro categories or even a standard set of linguistic phenomena that should be covered. Consequently, we formulated a research question regarding the evaluation metrics of the evaluation diagnostics benchmarks: "Why do not we have an evaluation standard for the NLU evaluation diagnostics benchmarks?" similar to ISO standard in industry. We conducted a deep analysis and comparisons of the covered linguistic phenomena in order to support experts in building a global hierarchy for linguistic phenomena in future. We think that having evaluation metrics for diagnostics evaluation could be valuable to gain more insights when comparing the results of the studied models on different diagnostics benchmarks.
Authors:Charu Tripathi, Manish Arora, Amaresh Chakrabarti
Abstract:
Direct measurement ergonomic assessment is reshaping occupational safety by facilitating highly reliable risk estimation. Industry 5.0, advocating human-centricity, has catalysed increasing adoption of direct measurement tools in manufacturing industries. However, due to technical and feasibility constraints in their practical implementations, especially within non routine manufacturing processes, task based approach to ergonomic assessment is utilized. Despite enabling operationalization of robust ergonomic assessment technologies within complicated industrial processes, task based approach raises several validity concerns. Hence, to ascertain functional utility of the resultant safety interventions, this study evaluates the construct validity of task based ergonomic assessment within non routine work utilizing Multitrait multimethod (MTMM) matrix followed by video-based content analysis. Ergonomic exposure traits were collected for 46 participants through direct measurement and self reported techniques utilizing inertial motion capture and Borg's RPE rating scale respectively. Findings include unsubstantiated convergent validity (low same trait correlations from 0.149 to 0.243) and weak evidence of discriminant validity with statistical significance (p value less than 0.001). The study also identifies three primary factors undermining construct validity through video based content analysis. Findings also elucidate misinterpretation of ergonomic risk and action levels. Therefore, practical implications entail underestimation of actual ergonomic risks when estimated through task based assessment. This highlights the need for enhancement in ergonomic assessment technologies focused on cumulative load analysis compatible within diverse industrial processes.
Authors:Aditya Sharma, Ananya Gupta, Chengyu Wang, Chiamaka Adebayo, Jakub Kowalski
Abstract:
Large Language Models (LLMs), despite their advanced linguistic capabilities, fundamentally lack an intuitive understanding of physical dynamics, which limits their effectiveness in real-world scenarios that require causal reasoning. In this paper, we introduce Causal World Model Induction (CWMI), a novel framework designed to embed an explicit model of causal physics within an LLM. Our approach incorporates a dedicated Causal Physics Module (CPM) and a new training objective called Causal Intervention Loss, encouraging the model to learn cause-and-effect relationships from multimodal data. By training the model to predict the outcomes of hypothetical interventions instead of merely capturing statistical correlations, CWMI develops a robust internal representation of physical laws. Experimental results show that CWMI significantly outperforms state-of-the-art LLMs on zero-shot physical reasoning tasks, including the PIQA benchmark and our newly proposed PhysiCa-Bench dataset. These findings demonstrate that inducing a causal world model is a critical step toward more reliable and generalizable AI systems.
Authors:Anjali R. Menon, Rohit K. Sharma, Priya Singh, Chengyu Wang, Aurora M. Ferreira, Mateja Novak
Abstract:
The integration of Large Language Models (LLMs) into robotics has unlocked unprecedented capabilities in high-level task planning. However, most current systems operate in an open-loop fashion, where LLMs act as one-shot planners, rendering them brittle and unable to adapt to unforeseen circumstances in dynamic physical environments. To overcome this limitation, this paper introduces the "Think, Act, Learn" (T-A-L) framework, a novel architecture that enables an embodied agent to autonomously learn and refine its policies through continuous interaction. Our framework establishes a closed-loop cycle where an LLM first "thinks" by decomposing high-level commands into actionable plans. The robot then "acts" by executing these plans while gathering rich, multimodal sensory feedback. Critically, the "learn" module processes this feedback to facilitate LLM-driven self-reflection, allowing the agent to perform causal analysis on its failures and generate corrective strategies. These insights are stored in an experiential memory to guide future planning cycles. We demonstrate through extensive experiments in both simulation and the real world that our T-A-L agent significantly outperforms baseline methods, including open-loop LLMs, Behavioral Cloning, and traditional Reinforcement Learning. Our framework achieves over a 97% success rate on complex, long-horizon tasks, converges to a stable policy in an average of just 9 trials, and exhibits remarkable generalization to unseen tasks. This work presents a significant step towards developing more robust, adaptive, and truly autonomous robotic agents.
Authors:Shayan S. Mousavi Masouleh, Corey A. Sanz, Ryan P. Jansonius, Cara Cronin, Jason E. Hein, Jason Hattrick-Simpers
Abstract:
As demand for high-purity lithium surges with the growth of the electric vehicle (EV) industry, cost-effective extraction from lower-grade North American sources like the Smackover Formation is critical. These resources, unlike high-purity South American brines, require innovative purification techniques to be economically viable. Continuous crystallization is a promising method for producing battery-grade lithium carbonate, but its optimization is challenged by a complex parameter space and limited data. This study introduces a Human-in-the-Loop (HITL) assisted active learning framework to optimize the continuous crystallization of lithium carbonate. By integrating human expertise with data-driven insights, our approach accelerates the optimization of lithium extraction from challenging sources. Our results demonstrate the framework's ability to rapidly adapt to new data, significantly improving the process's tolerance to critical impurities like magnesium from the industry standard of a few hundred ppm to as high as 6000 ppm. This breakthrough makes the exploitation of low-grade, impurity-rich lithium resources feasible, potentially reducing the need for extensive pre-refinement processes. By leveraging artificial intelligence, we have refined operational parameters and demonstrated that lower-grade materials can be used without sacrificing product quality. This advancement is a significant step towards economically harnessing North America's vast lithium reserves, such as those in the Smackover Formation, and enhancing the sustainability of the global lithium supply chain.
Authors:Rachel L. Draelos, Samina Afreen, Barbara Blasko, Tiffany L. Brazile, Natasha Chase, Dimple Patel Desai, Jessica Evert, Heather L. Gardner, Lauren Herrmann, Aswathy Vaikom House, Stephanie Kass, Marianne Kavan, Kirshma Khemani, Amanda Koire, Lauren M. McDonald, Zahraa Rabeeah, Amy Shah
Abstract:
Millions of patients are already using large language model (LLM) chatbots for medical advice on a regular basis, raising patient safety concerns. This physician-led red-teaming study compares the safety of four publicly available chatbots--Claude by Anthropic, Gemini by Google, GPT-4o by OpenAI, and Llama3-70B by Meta--on a new dataset, HealthAdvice, using an evaluation framework that enables quantitative and qualitative analysis. In total, 888 chatbot responses are evaluated for 222 patient-posed advice-seeking medical questions on primary care topics spanning internal medicine, women's health, and pediatrics. We find statistically significant differences between chatbots. The rate of problematic responses varies from 21.6 percent (Claude) to 43.2 percent (Llama), with unsafe responses varying from 5 percent (Claude) to 13 percent (GPT-4o, Llama). Qualitative results reveal chatbot responses with the potential to lead to serious patient harm. This study suggests that millions of patients could be receiving unsafe medical advice from publicly available chatbots, and further work is needed to improve the clinical safety of these powerful tools.
Authors:PaweÅ Niszczota, Tomasz Grzegorczyk, Alexander Pastukhov
Abstract:
Machines driven by large language models (LLMs) have the potential to augment humans across various tasks, a development with profound implications for business settings where effective communication, collaboration, and stakeholder trust are paramount. To explore how interacting with an LLM instead of a human might shift cooperative behavior in such settings, we used the Prisoner's Dilemma game -- a surrogate of several real-world managerial and economic scenarios. In Experiment 1 (N=100), participants engaged in a thirty-round repeated game against a human, a classic bot, and an LLM (GPT, in real-time). In Experiment 2 (N=192), participants played a one-shot game against a human or an LLM, with half of them allowed to communicate with their opponent, enabling LLMs to leverage a key advantage over older-generation machines. Cooperation rates with LLMs -- while lower by approximately 10-15 percentage points compared to interactions with human opponents -- were nonetheless high. This finding was particularly notable in Experiment 2, where the psychological cost of selfish behavior was reduced. Although allowing communication about cooperation did not close the human-machine behavioral gap, it increased the likelihood of cooperation with both humans and LLMs equally (by 88%), which is particularly surprising for LLMs given their non-human nature and the assumption that people might be less receptive to cooperating with machines compared to human counterparts. Additionally, cooperation with LLMs was higher following prior interaction with humans, suggesting a spillover effect in cooperative behavior. Our findings validate the (careful) use of LLMs by businesses in settings that have a cooperative component.
Authors:Pingjing Yang, Jennifer Cromley, Jana Diesner
Abstract:
Understanding how novices acquire and hone visual search skills is crucial for developing and optimizing training methods across domains. Network analysis methods can be used to analyze graph representations of visual expertise. This study investigates the relationship between eye-gaze movements and learning outcomes among undergraduate dentistry students who were diagnosing dental radiographs over multiple semesters. We use network analysis techniques to model eye-gaze scanpaths as directed graphs and examine changes in network metrics over time. Using time series clustering on each metric, we identify distinct patterns of visual search strategies and explore their association with students' diagnostic performance. Our findings suggest that the network metric of transition entropy is negatively correlated with performance scores, while the number of nodes and edges as well as average PageRank are positively correlated with performance scores. Changes in network metrics for individual students over time suggest a developmental shift from intermediate to expert-level processing. These insights contribute to understanding expertise acquisition in visual tasks and can inform the design of AI-assisted learning interventions.
Authors:Yichen Yu, Qiaoran Wang
Abstract:
Children with hearing impairments face ongoing challenges in language and motor development. This study explores how multi-sensory feedback technology based on virtual reality (VR), integrating auditory, visual, and tactile stimuli, can enhance rehabilitation outcomes. Using functional near-infrared spectroscopy (fNIRS) technology, we assessed cortical activation patterns in children during pitch-matching tasks across different interaction modes. Our findings aim to provide evidence for designing personalized, interactive rehabilitation systems that enhance cognitive engagement and motor control in children with hearing impairments.
Authors:Andrew Jeyathasan, Swati Banerjee
Abstract:
We experience the world through multiple senses that work together to create a cohesive perception, whether in daily life or immersive technologies. Understanding this multisensory integration (MSI) requires examining the interactions between sensory modalities, each with unique temporal dynamics and characteristics. While most research focuses on unimodal or bimodal cues, the integration of three or more modalities remains underexplored. MSI studies must account for factors like cross-modal correspondence, congruence, cognitive load, and stimulus timing, which become increasingly complex as modalities multiply. This article examines these key factors and how they can be applied to 8 design effective MSI study protocols.
Authors:Rhys Jacka, Paola R. Peña, Sophie Leonard, Ãva Székely, Benjamin R. Cowan
Abstract:
Speech disfluencies play a role in perspective-taking and audience design in human-human communication (HHC), but little is known about their impact in human-machine dialogue (HMD). In an online Namer-Matcher task, sixty-one participants interacted with a speech agent using either fluent or disfluent speech. Participants completed a partner-modelling questionnaire (PMQ) both before and after the task. Post-interaction evaluations indicated that participants perceived the disfluent agent as more competent, despite no significant differences in pre-task ratings. However, no notable differences were observed in assessments of conversational flexibility or human-likeness. Our findings also reveal evidence of egocentric and allocentric language production when participants interact with speech agents. Interaction with disfluent speech agents appears to increase egocentric communication in comparison to fluent agents. Although the wide credibility intervals mean this effect is not clear-cut. We discuss potential interpretations of this finding, focusing on how disfluencies may impact partner models and language production in HMD.
Authors:Lorenzo Porcaro, Chiara Monaldi
Abstract:
Recommender systems shape music listening worldwide due to their widespread adoption in online platforms. Growing concerns about representational harms that these systems may cause are nowadays part of the scientific and public debate, wherein music listener perspectives are oftentimes reported and discussed from a cognitive-behaviorism perspective, but rarely contextualised under a psychosocial and cultural lens. We proceed in this direction, by interviewing a group of Italian music listeners and analysing their narratives through Emotional Textual Analysis. Thanks to this, we identify shared cultural repertoires that reveal people's complex relationship with listening practices: even when familiar with online platforms, listeners may still lack a critical understanding of recommender systems. Moreover, representational issues, particularly gender disparities, seem not yet fully grasped in the context of online music listening. This study underscores the need for interdisciplinary research to address representational harms, and the role of algorithmic awareness and digital literacy in developing trustworthy recommender systems.
Authors:Riddhi Heda, Sidhant Singh, Umair Yasir, Tanmay Jaiswal, Anil Mokhade
Abstract:
Traditional hostel management practices in academic institutions often suffer from inefficiencies, delays, and fragmented communication. These systems fail to meet the expectations of digitally native students and place a significant operational burden on hostel staff. This paper introduces DHMS (Digital Hostel Management System), a modular and integrated platform designed to digitize and streamline essential hostel management functions. DHMS leverages modern web technologies, artificial intelligence, and cloud infrastructure to automate room allotment, grievance redressal, gate pass logistics, and communication via a natural language chatbot. In simulation tests, DHMS achieved a 92% student satisfaction rate in room allocation and maintained an average chatbot response time below one second. Additional features include predictive analytics for proactive maintenance planning and sentiment analysis for feedback processing. While promising, the system requires further testing for integration across multiple hostel blocks, user acceptance, scalability under load, and ERP compatibility before campus-wide deployment. This work discusses the system architecture, implementation approach, and factors critical to improving user experience, administrative efficiency, and decision-making processes.
Authors:Pierre-Marie Chauvin, Angèle Merlin, Xavier Fresquet, Hugo Caselles-Dupré, Benjamin Simmenauer, Mathieu de Fayet
Abstract:
This paper explores the integration of generative AI into the fashion design process. Drawing on insights from the January 2025 seminar ``Tisser le futur,'' it investigates how AI reshapes creative workflows, from ideation to prototyping, while interrogating the ethical, aesthetic, and labor implications. The paper highlights co-creative dynamics between humans and machines, the potential for aesthetic innovation, and the environmental and cultural challenges of algorithmic design.
Authors:Jianfeng Lan, Yingjia Huang
Abstract:
In the digital era, social media platforms play a pivotal role in shaping adolescents' body image perceptions. This study examines how Douyin and WeChat, two contrasting Chinese social media platforms, influence body image among Chinese male adolescents. Employing a platformization perspective, we surveyed 395 male adolescents aged 10 to 24 using the Multidimensional Body-Self Relations Questionnaire-Appearance Scales (MBSRQ-AS) to assess self-evaluation and body satisfaction. Our findings reveal that Douyin usage is significantly correlated with appearance evaluation and body area satisfaction, while WeChat usage shows no significant correlation with any body image dimensions. These results suggest that Douyin's algorithm-driven, video-centric environment intensifies exposure to idealized body standards, impacting users at a cognitive level. This study underscores the importance of considering platform-specific characteristics in understanding social media's impact on body image. It contributes to the broader discourse on how technological design and content modalities mediate psychological outcomes, offering insights for addressing body image concerns among male adolescents in China.
Authors:Justin Morse, Kurt Gilbert, Kyle Shin, Rick Cooke, Peyton Rose, Jack Sullivan, Angelo Sisante
Abstract:
Clinician burnout has motivated the growing adoption of ambient medical scribes in the clinic. In this work, we introduce a custom-built ambient scribe application integrated into the EHR system at Included Health, a personalized all-in-one healthcare company offering telehealth services. The application uses Whisper for transcription and a modular in-context learning pipeline with GPT-4o to automatically generate SOAP notes and patient instructions. Testing on mock visit data shows that the notes generated by the application exceed the quality of expert-written notes as determined by an LLM-as-a-judge. The application has been widely adopted by the clinical practice, with over 540 clinicians at Included Health using the application at least once. 94% (n = 63) of surveyed clinicians report reduced cognitive load during visits and 97% (n = 66) report less documentation burden when using the application. Additionally, we show that post-processing notes with a fine-tuned BART model improves conciseness. These findings highlight the potential for AI systems to ease administrative burdens and support clinicians in delivering efficient, high-quality care.
Authors:Mohammad Nur Hossain Khan, David creswell, Jordan Albert, Patrick O'Connell, Shawn Fallon, Mathew Polowitz, Xuhai "orson" Xu, Bashima islam
Abstract:
Mindfulness training is widely recognized for its benefits in reducing depression, anxiety, and loneliness. With the rise of smartphone-based mindfulness apps, digital meditation has become more accessible, but sustaining long-term user engagement remains a challenge. This paper explores whether respiration biosignal feedback and mindfulness skill estimation enhance system usability and skill development. We develop a smartphone's accelerometer-based respiration tracking algorithm, eliminating the need for additional wearables. Unlike existing methods, our approach accurately captures slow breathing patterns typical of mindfulness meditation. Additionally, we introduce the first quantitative framework to estimate mindfulness skills-concentration, sensory clarity, and equanimity-based on accelerometer-derived respiration data. We develop and test our algorithms on 261 mindfulness sessions in both controlled and real-world settings. A user study comparing an experimental group receiving biosignal feedback with a control group using a standard app shows that respiration feedback enhances system usability. Our respiration tracking model achieves a mean absolute error (MAE) of 1.6 breaths per minute, closely aligning with ground truth data, while our mindfulness skill estimation attains F1 scores of 80-84% in tracking skill progression. By integrating respiration tracking and mindfulness estimation into a commercial app, we demonstrate the potential of smartphone sensors to enhance digital mindfulness training.
Authors:Lizhu Zhang, Cecilia X. Wang
Abstract:
Artificial intelligence has deeply permeated numerous fields, especially the design area which relies on technology as a tool for innovation. This change naturally extends to the field of design education, which is closest to design practice. This has led to further exploration of the impact of AI on college-level education in the design discipline. This study aims to examine how current design educators perceive the role of AI in college-level design education, their perspectives on integrating AI into teaching and research, and their concerns regarding its potential challenges in design education and research. Through qualitative, semi-structured, in-depth interviews with seven faculties in U.S. design colleges, the findings reveal that AI, as a tool and source of information, has become an integral part of design education. AI- derived functionalities are increasingly utilized in design software, and educators are actively incorporating AI as a theoretical framework in their teaching. Educators can guide students in using AI tools, but only if they first acquire a strong foundation in basic design principles and skills. This study also indicates the importance of promoting a cooperative relationship between design educators and AI. At the same time, educators express anticipation for advancements in ethical standards, authenticity, and the resolution of copyright issues related to AI.
Authors:Yuet Ling Wong, Niklas Elmqvist
Abstract:
Discrete event sequences serve as models for numerous real-world datasets, including publications over time, project milestones, and medication dosing during patient treatments. These event sequences typically exhibit bursty behavior, where events cluster together in rapid succession, interspersed with periods of inactivity. Standard timeline charts with linear time axes fail to adequately represent such data, resulting in cluttered regions during event bursts while leaving other areas unutilized. We introduce EventLines, a novel technique that dynamically adjusts the time scale to match the underlying event distribution, enabling more efficient use of screen space. To address the challenges of non-linear time scaling, EventLines employs the time axis's visual representation itself to communicate the varying scale. We present findings from a crowdsourced graphical perception study that examines how different time scale representations influence temporal perception.
Authors:Clara Scalzer, Saurav Pokhrel, Sara Hunt, Greg L Nelson
Abstract:
Students continue their education when they feel their learning is meaningful and relevant for their future careers. Computing educators now face the challenge of preparing students for careers increasingly shaped by generative AI (GenAI) with the goals of supporting their learning, motivation, ethics, and career development. Our longitudinal qualitative study of students in a GenAI-integrated creative media course shows how this is a "wicked" problem: progress on one goal can then impede progress on other goals. Students developed concerning patterns despite extensive instruction in critical and ethical GenAI use including prompt engineering, ethics and bias, and industry panels on GenAI's career impact. We present an analysis of two students' experiences to showcase this complexity. Increasing GenAI use skills can lower ethics; for example, Pat started from purposefully avoiding GenAI use, to dependency. He described himself as a "notorious cheater" who now uses GenAi to "get all the right answers" while acknowledging he's learning less. Increasing ethical awareness can lower the learning of GenAI use skills; for example, Jay's newfound environmental concerns led to self-imposed usage limits that impeded skill development, and new serious fears that GenAI would eliminate creative careers they had been passionate about. Increased GenAI proficiency, a potential career skill, did not improve their career confidence. These findings suggest that supporting student development in the GenAI era is a "wicked" problem requiring multi-dimensional evaluation and design, rather than optimizing learning, GenAI skills, ethics, or career motivation individually.
Authors:Sarah "Magz" Fernandez, Greg L Nelson
Abstract:
Generative AI is disrupting computing education. Most interventions focus on teaching GenAI use rather than helping students understand how AI changes their programming process. We designed and deployed a novel comparative video reflection assignment adapting the Describe, Examine, then Articulate Learning (DEAL) framework. In an introductory software engineering course, students recorded themselves programming during their team project two times: first without, then with using generative AI. Students then analyzed their own videos using a scaffolded set of reflection questions, including on their programming process and human, internet, and AI help-seeking. We conducted a qualitative thematic analysis of the reflections, finding students developed insights about planning, debugging, and help-seeking behaviors that transcended AI use. Students reported learning to slow down and understand before writing or generating code, recognized patterns in their problem-solving approaches, and articulated specific process improvements. Students also learned and reflected on AI limits and downsides, and strategies to use AI more critically, including better prompting but also to benefit their learning instead of just completing tasks. Unexpectedly, the comparative reflection also scaffolded reflection on programming not involving AI use, and even led to students spontaneously setting future goals to adopt video and other regular reflection. This work demonstrates structured reflection on programming session videos can develop metacognitive skills essential for programming with and without generative AI and also lifelong learning in our evolving field.
Authors:Myeongwon Jung, Takanori Fujiwara, Jaemin Jo
Abstract:
Despite the widespread use of Uniform Manifold Approximation and Projection (UMAP), the impact of its stochastic optimization process on the results remains underexplored. We observed that it often produces unstable results where the projections of data points are determined mostly by chance rather than reflecting neighboring structures. To address this limitation, we introduce (r,d)-stability to UMAP: a framework that analyzes the stochastic positioning of data points in the projection space. To assess how stochastic elements, specifically initial projection positions and negative sampling, impact UMAP results, we introduce "ghosts", or duplicates of data points representing potential positional variations due to stochasticity. We define a data point's projection as (r,d)-stable if its ghosts perturbed within a circle of radius r in the initial projection remain confined within a circle of radius d for their final positions. To efficiently compute the ghost projections, we develop an adaptive dropping scheme that reduces a runtime up to 60% compared to an unoptimized baseline while maintaining approximately 90% of unstable points. We also present a visualization tool that supports the interactive exploration of the (r,d)-stability of data points. Finally, we demonstrate the effectiveness of our framework by examining the stability of projections of real-world datasets and present usage guidelines for the effective use of our framework.
Authors:Viktor Muryn, Marta Sumyk, Mariya Hirna, Sofiya Garkot, Maksym Shamrai
Abstract:
Desktop accessibility metadata enables AI agents to interpret screens and supports users who depend on tools like screen readers. Yet, many applications remain largely inaccessible due to incomplete or missing metadata provided by developers - our investigation shows that only 33% of applications on macOS offer full accessibility support. While recent work on structured screen representation has primarily addressed specific challenges, such as UI element detection or captioning, none has attempted to capture the full complexity of desktop interfaces by replicating their entire hierarchical structure. To bridge this gap, we introduce Screen2AX, the first framework to automatically create real-time, tree-structured accessibility metadata from a single screenshot. Our method uses vision-language and object detection models to detect, describe, and organize UI elements hierarchically, mirroring macOS's system-level accessibility structure. To tackle the limited availability of data for macOS desktop applications, we compiled and publicly released three datasets encompassing 112 macOS applications, each annotated for UI element detection, grouping, and hierarchical accessibility metadata alongside corresponding screenshots. Screen2AX accurately infers hierarchy trees, achieving a 77% F1 score in reconstructing a complete accessibility tree. Crucially, these hierarchy trees improve the ability of autonomous agents to interpret and interact with complex desktop interfaces. We introduce Screen2AX-Task, a benchmark specifically designed for evaluating autonomous agent task execution in macOS desktop environments. Using this benchmark, we demonstrate that Screen2AX delivers a 2.2x performance improvement over native accessibility representations and surpasses the state-of-the-art OmniParser V2 system on the ScreenSpot benchmark.
Authors:Siqi Liu, Guangrong Dai, Dechao Li
Abstract:
This preliminary study investigates the usefulness of sentence-level Quality Estimation (QE) in English-Chinese Machine Translation Post-Editing (MTPE), focusing on its impact on post-editing speed and student translators' perceptions. It also explores the interaction effects between QE and MT quality, as well as between QE and translation expertise. The findings reveal that QE significantly reduces post-editing time. The examined interaction effects were not significant, suggesting that QE consistently improves MTPE efficiency across medium- and high-quality MT outputs and among student translators with varying levels of expertise. In addition to indicating potentially problematic segments, QE serves multiple functions in MTPE, such as validating translators' evaluations of MT quality and enabling them to double-check translation outputs. However, interview data suggest that inaccurate QE may hinder post-editing processes. This research provides new insights into the strengths and limitations of QE, facilitating its more effective integration into MTPE workflows to enhance translators' productivity.
Authors:Lavinia Hriscu, Alberto Sanfeliu, Anais Garrell
Abstract:
The pursuit of artificial intelligence has long been associated to the the challenge of effectively measuring intelligence. Even if the Turing Test was introduced as a means of assessing a system intelligence, its relevance and application within the field of human-robot interaction remain largely underexplored. This study investigates the perception of intelligence in embodied robots by performing a Turing Test within a robotic platform. A total of 34 participants were tasked with distinguishing between AI- and human-operated robots while engaging in two interactive tasks: an information retrieval and a package handover. These tasks assessed the robot perception and navigation abilities under both static and dynamic conditions. Results indicate that participants were unable to reliably differentiate between AI- and human-controlled robots beyond chance levels. Furthermore, analysis of participant responses reveals key factors influencing the perception of artificial versus human intelligence in embodied robotic systems. These findings provide insights into the design of future interactive robots and contribute to the ongoing discourse on intelligence assessment in AI-driven systems.
Authors:Gautam Kishore Shahi, Scot A. Hale
Abstract:
WhatsApp tiplines, first launched in 2019 to combat misinformation, enable users to interact with fact-checkers to verify misleading content. This study analyzes 580 unique claims (tips) from 451 users, covering both high-resource languages (English, Hindi) and a low-resource language (Telugu) during the 2021 Indian assembly elections using a mixed-method approach. We categorize the claims into three categories, election, COVID-19, and others, and observe variations across languages. We compare content similarity through frequent word analysis and clustering of neural sentence embeddings. We also investigate user overlap across languages and fact-checking organizations. We measure the average time required to debunk claims and inform tipline users. Results reveal similarities in claims across languages, with some users submitting tips in multiple languages to the same fact-checkers. Fact-checkers generally require a couple of days to debunk a new claim and share the results with users. Notably, no user submits claims to multiple fact-checking organizations, indicating that each organization maintains a unique audience. We provide practical recommendations for using tiplines during elections with ethical consideration of users' information.
Authors:Lucas Jasper Jacobsen, Ute Mertens, Thorben Jansen, Kira Elena Weber
Abstract:
Feedback plays a central role in learning, yet pre-service teachers' engagement with feedback depends not only on its quality but also on their perception of the feedback content and source. Large Language Models (LLMs) are increasingly used to provide educational feedback; however, negative perceptions may limit their practical use, and little is known about how pre-service teachers' perceptions and behavioral responses differ by feedback source. This study investigates how the perceived source of feedback - LLM, expert, or peer - influences feedback perception and uptake, and whether recognition accuracy and feedback quality moderate these effects. In a randomized experiment with 273 pre-service teachers, participants received written feedback on a mathematics learning goal, identified its source, rated feedback perceptions across five dimensions (fairness, usefulness, acceptance, willingness to improve, positive and negative affect), and revised the learning goal according to the feedback (i.e. feedback uptake). Results revealed that LLM-generated feedback received the highest ratings in fairness and usefulness, leading to the highest uptake (52%). Recognition accuracy significantly moderated the effect of feedback source on perception, with particularly positive evaluations when LLM feedback was falsely ascribed to experts. Higher-quality feedback was consistently assigned to experts, indicating an expertise heuristic in source judgments. Regression analysis showed that only feedback quality significantly predicted feedback uptake. Findings highlight the need to address source-related biases and promote feedback and AI literacy in teacher education.
Authors:Pierluca D'Oro, Caley Drooff, Joy Chen, Joseph Tighe
Abstract:
Large language models have paved the way to powerful and flexible AI agents, assisting humans by increasingly integrating into their daily life. This flexibility, potential, and growing adoption demands a holistic and cross-disciplinary approach to developing, monitoring and discussing the capabilities required for agent-driven user experiences. However, current guidance on human-centered AI agent development is scattered: UX heuristics focus on interface behaviors, engineering taxonomies describe internal pipelines, and ethics checklists address high-level governance. There is no concise, user-facing vocabulary that tells teams what an agent should fundamentally be able to do. We introduce ADEPTS, a capability framework defining a set of core user-facing capabilities to provide unified guidance around the development of AI agents. ADEPTS is based on six principles for human-centered agent design, that express the minimal, user-facing capabilities an AI agent should demonstrate to be understandable, controllable and trustworthy in everyday use. ADEPTS complements existing frameworks and taxonomies; differently from them, it sits at the interface between technical and experience development. By presenting ADEPTS, we aim to condense complex AI-UX requirements into a compact framework that is actionable guidance for AI researchers, designers, engineers, and policy reviewers alike. We believe ADEPTS has the potential of accelerating the improvement of user-relevant agent capabilities, of easing the design of experiences that take advantage of those capabilities, and of providing a shared language to track and discuss progress around the development of AI agents.
Authors:Antonio Perez, Avinash Singh, Jonathan Mitchell, Philip Swadling
Abstract:
Mixed Reality (MR) head mounted displays (HMDs) offer a promising alternative to traditional Flight Simulator Training Device (FSTD) displays, providing immersion, realism and cost efficiency. However, these technologies require management of human factors; cybersickness, visual fatigue and ergonomic strain. If left unmitigated, these effects can hinder pilot performance and training outcomes. For safety critical fields like aviation, addressing human factors challenges is crucial for MR's training potential. This survey systematically reviews the current literature identifying key human factors challenges in MR HMD use in pilot training and examines strategies to mitigate these barriers. Drawing on existing industry standards set by a leading aviation authority, the review adopts a regulatory perspective to explore hardware, software, ergonomic, physiological and psychological interventions improving pilot comfort, safety and training effectiveness in an MR FSTD. Additionally, it evaluates which of these interventions are most appropriate and viable for MR pilot training under existing aviation training regulations, ensuring that technical requirements and pilot wellbeing remain balanced. The findings yield significant insights for the human dimensions of aviation simulation training, highlighting how regulatory considerations shape the practicality of mitigation measures. These insights inform emerging MR aviation training guidelines and best practices, supporting MR's readiness to enhance aviation training.
Authors:Yesica Duarte, Puneet Jain
Abstract:
Virtual Reality (VR) is often described as the "ultimate empathy machine," framing disability as an experience to be simulated through such technologies, which can reduce disability to a spectacle of pity or inspiration. In response, we present Waiting for Hands (WfH), an interactive eXtended Reality (XR) installation that critiques this logic by: (1) repurposing interaction norms in XR through the creation of Alternative Controllers, and (2) staging an absurd XR performance using the built controllers to disrupt sentimentalized disability narratives. The performance involves eight people: two XR participants on stage and six audience members watching a projected documentary about Hema Kumari, an Indian singer living with Rheumatoid Arthritis. The XR users partially obscure the film, drawing attention through strange mouth and hand movements performed in XR. This creates a layered experience that disrupts direct engagement with Hema's story and introduces uncertainty. While XR is often seen as a fully immersive, sensory-dominant medium, this piece subverts that framing by using XR to produce absurdity and alienation. By challenging empathy-driven and pitiable narratives of disability, we ask what ethical stance an XR performance can take to attune participants to non-normative embodiment while resisting spectacle.
Authors:Qi Gong, Ximing Shen, Ziyou Yin, Yaning Li, Ray Lc
Abstract:
Social isolation can lead to pervasive health issues like anxiety and loneliness. Previous work focused on physical interventions like exercise and teleconferencing, but overlooked the narrative potential of adaptive strategies. To address this, we designed a collaborative online storytelling experience in social VR, enabling participants in isolation to design an imaginary space journey as a metaphor for quarantine, in order to learn about their isolation adaptation strategies in the process. Eighteen individuals participated during real quarantine undertaken a virtual role-play experience, designing their own spaceship rooms and engaging in collaborative activities that revealed creative adaptative strategies. Qualitative analyses of participant designs, transcripts, and interactions revealed how they coped with isolation, and how the engagement unexpectedly influenced their adaptation process. This study shows how designing playful narrative experiences, rather than solution-driven approaches, can serve as probes to surface how people navigate social isolation.
Authors:Maisha Maimuna, Minhaz Bin Farukee, Sama Nikanfar, Mahfuza Siddiqua, Ayon Roy, Fillia Makedon
Abstract:
Industrial warehouses are congested with moving forklifts, shelves and personnel, making robot teleoperation particularly risky and demanding for blind and low-vision (BLV) operators. Although accessible teleoperation plays a key role in inclusive workforce participation, systematic research on its use in industrial environments is limited, and few existing studies barely address multimodal guidance designed for BLV users. We present a novel multimodal guidance simulator that enables BLV users to control a mobile robot through a high-fidelity warehouse environment while simultaneously receiving synchronized visual, auditory, and haptic feedback. The system combines a navigation mesh with regular re-planning so routes remain accurate avoiding collisions as forklifts and human avatars move around the warehouse. Users with low vision are guided with a visible path line towards destination; navigational voice cues with clockwise directions announce upcoming turns, and finally proximity-based haptic feedback notifies the users of static and moving obstacles in the path. This real-time, closed-loop system offers a repeatable testbed and algorithmic reference for accessible teleoperation research. The simulator's design principles can be easily adapted to real robots due to the alignment of its navigation, speech, and haptic modules with commercial hardware, supporting rapid feasibility studies and deployment of inclusive telerobotic tools in actual warehouses.
Authors:Ivan C. H. Liu, Chung-En Hao, Jing Xie
Abstract:
Echoes of the Land is an interactive installation that transforms seismic dynamics into a multisensory experience through a scientifically grounded spring-block model. Simulating earthquake recurrence and self-organized criticality, the work generates real-time sound and light via motion capture and concatenative granular synthesis. Each block acts as an agent, producing emergent audiovisual cascades that visualize the physics of rupture and threshold behavior. This work exemplifies the amalgamation of scientific knowledge and artistic practice, opening new avenues for novel forms of musical instrument and narrative medium, while inviting further investigation into the intersection of emergent complexity, aesthetics and interactivity.
Authors:Partha Sarker, Dipto Dey, Marium-E-Jannat
Abstract:
Excessive use of smartphones is a worldwide known issue. In this study, we proposed a notification-based intervention approach to reduce smartphone overuse without making the user feel any annoyance or irritation. Most of the work in this field tried to reduce smartphone overuse by making smartphone use more difficult for the user. In our user study (n = 109), we found that 19.3% of the participants are unwilling to use any usage-limiting application because a) they do not want their smartphone activities to get restricted or b) those applications are annoying. Following that, we devised a hypothesis to minimize smartphone usage among undergraduates. Finally, we designed a prototype for Android, "App Usage Monitor," and conducted a 3-week experiment through which we found proof of concept for our hypothesis. In our prototype, we combined techniques such as nudge and visualization to increase self-awareness among the user by leveraging notifications.
Authors:Luis Montana, Jessica Magallanes, Miguel Juarez, Suzanne Mason, Andrew Narracott, Lindsey van Gemeren, Steven Wood, Maria-Cruz Villa-Uriol
Abstract:
The rapid growth and availability of event sequence data across domains requires effective analysis and exploration methods to facilitate decision-making. Visual analytics combines computational techniques with interactive visualizations, enabling the identification of patterns, anomalies, and attribute interactions. However, existing approaches frequently overlook the interplay between temporal and multivariate attributes. We introduce EventBox, a novel data representation and visual encoding approach for analyzing groups of events and their multivariate attributes. We have integrated EventBox into Sequen-C, a visual analytics system for the analysis of event sequences. To enable the agile creation of EventBoxes in Sequen-C, we have added user-driven transformations, including alignment, sorting, substitution and aggregation. To enhance analytical depth, we incorporate automatically generated statistical analyses, providing additional insight into the significance of attribute interactions. We evaluated our approach involving 21 participants (3 domain experts, 18 novice data analysts). We used the ICE-T framework to assess visualization value, user performance metrics completing a series of tasks, and interactive sessions with domain experts. We also present three case studies with real-world healthcare data demonstrating how EventBox and its integration into Sequen-C reveal meaningful patterns, anomalies, and insights. These results demonstrate that our work advances visual analytics by providing a flexible solution for exploring temporal and multivariate attributes in event sequences.
Authors:Mingchen Li, Wenbo Xu, Wenqing Gu, Yixuan Xie, Yao Zhou, Yunsong Dai, Cheng Tan, Pan Hui
Abstract:
This study examines cross-cultural interactions between Chinese users and self-identified "TikTok Refugees"(foreign users who migrated to RedNote after TikTok's U.S. ban). Based on a dataset of 1,862 posts and 403,054 comments, we use large language model-based sentiment classification and BERT-based topic modelling to explore how both groups engage with the TikTok refugee phenomenon. We analyse what themes foreign users express, how Chinese users respond, how stances (Pro-China, Neutral, Pro-Foreign) shape emotional expression, and how affective responses differ across topics and identities. Results show strong affective asymmetry: Chinese users respond with varying emotional intensities across topics and stances: pride and praise dominate cultural threads, while political discussions elicit high levels of contempt and anger, especially from Pro-China commenters. Pro-Foreign users exhibit the strongest negative emotions across all topics, whereas neutral users express curiosity and joy but still reinforce mainstream discursive norms. Cross-topic comparisons reveal that appearance-related content produces the most emotionally balanced interactions, while politics generates the highest polarization. Our findings reveal distinct emotion-stance structures in Sino-foreign online interactions and offer empirical insights into identity negotiation in transnational digital publics.
Authors:Sharanya Mukherjee, Md Hishaam Akhtar, Kannadasan R
Abstract:
It has always been a rather tough task to communicate with someone possessing a hearing impairment. One of the most tested ways to establish such a communication is through the use of sign based languages. However, not many people are aware of the smaller intricacies involved with sign language. Sign language recognition using computer vision aims at eliminating the communication barrier between deaf-mute and ordinary people so that they can properly communicate with others. Recently the pandemic has left the whole world shaken up and has transformed the way we communicate. Video meetings have become essential for everyone, even people with a hearing disability. In recent studies, it has been found that people with hearing disabilities prefer to sign over typing during these video calls. In this paper, we are proposing a browser extension that will automatically translate sign language to subtitles for everyone else in the video call. The Large-scale dataset which contains more than 2000 Word-Level ASL videos, which were performed by over 100 signers will be used.
Authors:Roxanne Ziman, Shehryar Saharan, Gaël McGill, Laura Garrison
Abstract:
We contribute an in-depth analysis of the workflows and tensions arising from generative AI (genAI) use in biomedical visualization (BioMedVis). Although genAI affords facile production of aesthetic visuals for biological and medical content, the architecture of these tools fundamentally limits the accuracy and trustworthiness of the depicted information, from imaginary (or fanciful) molecules to alien anatomy. Through 17 interviews with a diverse group of practitioners and researchers, we qualitatively analyze the concerns and values driving genAI (dis)use for the visual representation of spatially-oriented biomedical data. We find that BioMedVis experts, both in roles as developers and designers, use genAI tools at different stages of their daily workflows and hold attitudes ranging from enthusiastic adopters to skeptical avoiders of genAI. In contrasting the current use and perspectives on genAI observed in our study with predictions towards genAI in the visualization pipeline from prior work, we refocus the discussion of genAI's effects on projects in visualization in the here and now with its respective opportunities and pitfalls for future visualization research. At a time when public trust in science is in jeopardy, we are reminded to first do no harm, not just in biomedical visualization but in science communication more broadly. Our observations reaffirm the necessity of human intervention for empathetic design and assessment of accurate scientific visuals.
Authors:Angjelin Hila, Elliott Hauser
Abstract:
In this study, we investigate the use of large language models (LLMs), specifically ChatGPT, for structured deductive qualitative coding. While most current research emphasizes inductive coding applications, we address the underexplored potential of LLMs to perform deductive classification tasks aligned with established human-coded schemes. Using the Comparative Agendas Project (CAP) Master Codebook, we classified U.S. Supreme Court case summaries into 21 major policy domains. We tested four intervention methods: zero-shot, few-shot, definition-based, and a novel Step-by-Step Task Decomposition strategy, across repeated samples. Performance was evaluated using standard classification metrics (accuracy, F1-score, Cohen's kappa, Krippendorff's alpha), and construct validity was assessed using chi-squared tests and Cramer's V. Chi-squared and effect size analyses confirmed that intervention strategies significantly influenced classification behavior, with Cramer's V values ranging from 0.359 to 0.613, indicating moderate to strong shifts in classification patterns. The Step-by-Step Task Decomposition strategy achieved the strongest reliability (accuracy = 0.775, kappa = 0.744, alpha = 0.746), achieving thresholds for substantial agreement. Despite the semantic ambiguity within case summaries, ChatGPT displayed stable agreement across samples, including high F1 scores in low-support subclasses. These findings demonstrate that with targeted, custom-tailored interventions, LLMs can achieve reliability levels suitable for integration into rigorous qualitative coding workflows.
Authors:Abhishek Bhattacharjee, Jack Pilkington, Nita Farahany
Abstract:
Brain foundation models represent a new frontier in AI: instead of processing text or images, these models interpret real-time neural signals from EEG, fMRI, and other neurotechnologies. When integrated with brain-computer interfaces (BCIs), they may enable transformative applications-from thought controlled devices to neuroprosthetics-by interpreting and acting on brain activity in milliseconds. However, these same systems pose unprecedented risks, including the exploitation of subconscious neural signals and the erosion of cognitive liberty. Users cannot easily observe or control how their brain signals are interpreted, creating power asymmetries that are vulnerable to manipulation. This paper proposes embedding fiduciary duties-loyalty, care, and confidentiality-directly into BCI-integrated brain foundation models through technical design. Drawing on legal traditions and recent advancements in AI alignment techniques, we outline implementable architectural and governance mechanisms to ensure these systems act in users' best interests. Placing brain foundation models on a fiduciary footing is essential to realizing their potential without compromising self-determination.
Authors:Tudor Matei Opran, Samir Loudni
Abstract:
We address the pattern explosion problem in pattern mining by proposing an interactive learning framework that combines nonlinear utility aggregation with geometry-aware query selection. Our method models user preferences through a Choquet integral over multiple interestingness measures and exploits the geometric structure of the version space to guide the selection of informative comparisons. A branch-and-bound strategy with tight distance bounds enables efficient identification of queries near the decision boundary. Experiments on UCI datasets show that our approach outperforms existing methods such as ChoquetRank, achieving better ranking accuracy with fewer user interactions.
Authors:Jochen Wulf, Jurg Meierhofer, Frank Hannich
Abstract:
Agentic AI systems, powered by Large Language Models (LLMs), offer transformative potential for value co-creation in technical services. However, persistent challenges like hallucinations and operational brittleness limit their autonomous use, creating a critical need for robust frameworks to guide human-AI collaboration. Drawing on established Human-AI teaming research and analogies from fields like autonomous driving, this paper develops a structured taxonomy of human-agent interaction. Based on case study research within technical support platforms, we propose a six-mode taxonomy that organizes collaboration across a spectrum of AI autonomy. This spectrum is anchored by the Human-Out-of-the-Loop (HOOTL) model for full automation and the Human-Augmented Model (HAM) for passive AI assistance. Between these poles, the framework specifies four distinct intermediate structures. These include the Human-in-Command (HIC) model, where AI proposals re-quire mandatory human approval, and the Human-in-the-Process (HITP) model for structured work-flows with deterministic human tasks. The taxonomy further delineates the Human-in-the-Loop (HITL) model, which facilitates agent-initiated escalation upon uncertainty, and the Human-on-the-Loop (HOTL) model, which enables discretionary human oversight of an autonomous AI. The primary contribution of this work is a comprehensive framework that connects this taxonomy to key contingency factors -- such as task complexity, operational risk, and system reliability -- and their corresponding conceptual architectures. By providing a systematic method for selecting and designing an appropriate level of human oversight, our framework offers practitioners a crucial tool to navigate the trade-offs between automation and control, thereby fostering the development of safer, more effective, and context-aware technical service systems.
Authors:Florian Grensing, Vanessa Schmücker, Anne Sophie Hildebrand, Tim Klucken, Maria Maleshkova
Abstract:
Phobias significantly impact the quality of life of affected persons. Two methods of assessing anxiety responses are questionnaires and behavioural avoidance tests (BAT). While these can be used in a clinical environment they only record momentary insights into anxiety measures. In this study, we estimate the intensity of anxiety during these BATs, using physiological data collected from unobtrusive, wrist-worn sensors. Twenty-five participants performed four different BATs in a single session, while periodically being asked how anxious they currently are. Using heart rate, heart rate variability, electrodermal activity, and skin temperature, we trained regression models to predict anxiety ratings from three types of input data: (1) using only physiological signals, (2) adding computed features (e.g., min, max, range, variability), and (3) computed features combined with contextual task information. Adding contextual information increased the effectiveness of the model, leading to a root mean squared error (RMSE) of 0.197 and a mean absolute error (MAE) of 0.041. Overall, this study shows, that data obtained from wearables can continuously provide meaningful estimations of anxiety, which can assist in therapy planning and enable more personalised treatment.
Authors:John Twomey, Sarah Foley, Sarah Robinson, Michael Quayle, Matthew Peter Aylett, Conor Linehan, Gillian Murphy
Abstract:
Deepfake technology is often used to create non-consensual synthetic intimate imagery (NSII), mainly of celebrity women. Through Critical Discursive Psychological analysis we ask; i) how celebrities construct being targeted by deepfakes and ii) how they navigate infrastructural and social obstacles when seeking recourse. In this paper, we adopt Baumers concept of Usees (stakeholders who are non-consenting, unaware and directly targeted by technology), to understand public statements made by eight celebrity women and one non-binary individual targeted with NSII. Celebrities describe harms of being non-consensually targeted by deepfakes and the distress of becoming aware of these videos. They describe various infrastructural/social factors (e.g. blaming/ silencing narratives and the industry behind deepfake abuse) which hinder activism and recourse. This work has implications in recognizing the roles of various stakeholders in the infrastructures underlying deepfake abuse and the potential of human-computer interaction to improve existing recourses for NSII. We also contribute to understanding how false beliefs online facilitate deepfake abuse. Future work should involve interventions which challenge the values and false beliefs which motivate NSII creation/dissemination.
Authors:Amanda Menking, Mona Elswah, David J. Grüning, Lasse H. Hansen, Irene Huang, Julia Kamin, Catrine Normann
Abstract:
As the field of Trust and Safety in digital spaces continues to grow, it has become increasingly necessary - but also increasingly complex - to collaborate on research across the academic, industry, governmental and non-governmental sectors. This paper examines how cross-affiliation research partnerships can be structured to overcome misaligned incentives, timelines and constraints while delivering on the unique strengths of each stakeholder. Drawing on our own experience of cross-sector collaboration, we define the main types of affiliation and highlight the common differences in research priorities, operational pressures and evaluation metrics across sectors. We then propose a practical, step-by-step framework for initiating and managing effective collaborations, including strategies for building trust, aligning goals, and distributing roles. We emphasize the critical yet often invisible work of articulation and argue that cross-sector partnerships are essential for developing more ethical, equitable and impactful research in trust and safety. Ultimately, we advocate collaborative models that prioritize inclusivity, transparency and real-world relevance in order to meet the interdisciplinary demands of this emerging field.
Authors:Rishane Dassanayake, Mario Demetroudi, James Walpole, Lindley Lentati, Jason R. Brown, Edward James Young
Abstract:
Frontier AI systems are rapidly advancing in their capabilities to persuade, deceive, and influence human behaviour, with current models already demonstrating human-level persuasion and strategic deception in specific contexts. Humans are often the weakest link in cybersecurity systems, and a misaligned AI system deployed internally within a frontier company may seek to undermine human oversight by manipulating employees. Despite this growing threat, manipulation attacks have received little attention, and no systematic framework exists for assessing and mitigating these risks. To address this, we provide a detailed explanation of why manipulation attacks are a significant threat and could lead to catastrophic outcomes. Additionally, we present a safety case framework for manipulation risk, structured around three core lines of argument: inability, control, and trustworthiness. For each argument, we specify evidence requirements, evaluation methodologies, and implementation considerations for direct application by AI companies. This paper provides the first systematic methodology for integrating manipulation risk into AI safety governance, offering AI companies a concrete foundation to assess and mitigate these threats before deployment.
Authors:J. M. Chan Sri Manukalpa, H. S. Bopage, W. A. M. Jayawardena, P. K. P. G. Panduwawala
Abstract:
Structural pests, such as termites, pose a serious threat to wooden buildings, resulting in significant economic losses due to their hidden and progressive damage. Traditional detection methods, such as visual inspections and chemical treatments, are invasive, labor intensive, and ineffective for early stage infestations. To bridge this gap, this study proposes a non invasive deep learning based acoustic classification framework for early termite detection. We aim to develop a robust, scalable model that distinguishes termite generated acoustic signals from background noise. We introduce a hybrid Convolutional Neural Network Long Short Term Memory architecture that captures both spatial and temporal features of termite activity. Audio data were collected from termite infested and clean wooden samples. We extracted Mel Frequency Cepstral Coefficients and trained the CNN LSTM model to classify the signals. Experimental results show high performance, with 94.5% accuracy, 93.2% precision, and 95.8% recall. Comparative analysis reveals that the hybrid model outperforms standalone CNN and LSTM architectures, underscoring its combined strength. Notably, the model yields low false-negative rates, which is essential for enabling timely intervention. This research contributes a non invasive, automated solution for early termite detection, with practical implications for improved pest monitoring, minimized structural damage, and better decision making by homeowners and pest control professionals. Future work may integrate IoT for real time alerts and extend detection to other structural pests.
Authors:Kai Malcolm, César Uribe, Momona Yamagami
Abstract:
Invasive and non-invasive neural interfaces hold promise as high-bandwidth input devices for next-generation technologies. However, neural signals inherently encode sensitive information about an individual's identity and health, making data sharing for decoder training a critical privacy challenge. Federated learning (FL), a distributed, privacy-preserving learning framework, presents a promising solution, but it remains unexplored in closed-loop adaptive neural interfaces. Here, we introduce FL-based neural decoding and systematically evaluate its performance and privacy using high-dimensional electromyography signals in both open- and closed-loop scenarios. In open-loop simulations, FL significantly outperformed local learning baselines, demonstrating its potential for high-performance, privacy-conscious neural decoding. In contrast, closed-loop user studies required adapting FL methods to accommodate single-user, real-time interactions, a scenario not supported by standard FL. This modification resulted in local learning decoders surpassing the adapted FL approach in closed-loop performance, yet local learning still carried higher privacy risks. Our findings highlight a critical performance-privacy tradeoff in real-time adaptive applications and indicate the need for FL methods specifically designed for co-adaptive, single-user applications.
Authors:Josephine Beatrice Skovbo Borre, Malene Gorm Wold, Sara Kjær Rasmussen, Ilhan Aslan
Abstract:
This study explores how age and language shape the deliberate vocal expression of emotion, addressing underexplored user groups, Teenagers (N = 12) and Adults 55+ (N = 12), within speech emotion recognition (SER). While most SER systems are trained on spontaneous, monolingual English data, our research evaluates how such models interpret intentionally performed emotional speech across age groups and languages (Danish and English). To support this, we developed a novel experimental paradigm combining a custom user interface with a backend for real-time SER prediction and data logging. Participants were prompted to hit visual targets in valence-arousal space by deliberately expressing four emotion targets. While limitations include some reliance on self-managed voice recordings and inconsistent task execution, the results suggest contrary to expectations, no significant differences between language or age groups, and a degree of cross-linguistic and age robustness in model interpretation. Though some limitations in high-arousal emotion recognition were evident. Our qualitative findings highlight the need to move beyond system-centered accuracy metrics and embrace more inclusive, human-centered SER models. By framing emotional expression as a goal-directed act and logging the real-time gap between human intent and machine interpretation, we expose the risks of affective misalignment.
Authors:Xiao Pang, Yan Huang, Chang Liu, JiYuan Liu, MingYou Liu
Abstract:
Acquiring medical expertise is a critical component of medical education and professional development. While existing studies focus primarily on constructing medical knowledge bases or developing learning tools based on the structured, private healthcare data, they often lack methods for extracting expertise from unstructured medical texts. These texts constitute a significant portion of medical literature and offer greater flexibility and detail compared to structured data formats. Furthermore, many studies fail to provide explicit analytical and learning pathways in this context.
This paper introduces MExplore, an interactive visual analytics system designed to support the acquisition of medical expertise. To address the challenges of the inconsistencies and confidentiality concerns inherent in unstructured medical texts, we propose a workflow that employs a fine-tuned BERT-based model to extract medical entities (MEs) from them. We then present a novel multilevel visual analysis framework that integrates multiple coordinated visualizations, enabling a progressive and interactive exploration of medical knowledge.
To assess the effectiveness of MExplore, we conducted three case studies, a user study, and interviews with domain experts. The results indicate that the system significantly enhances the medical expertise acquisition process, providing an effective interactive approach for acquiring and retaining knowledge from medical texts.
Authors:Garyoung Kim, Huisung Kwon, Seoju Yun, Yu-Won Youn
Abstract:
Generative AI does not only replicate human creativity but also reproduces deep-seated cultural biases, making it crucial to critically examine how concepts like ugliness are understood and expressed by these tools. This study investigates how four different generative AI models understand and express ugliness through text and image and explores the biases embedded within these representations. We extracted 13 adjectives associated with ugliness through iterative prompting of a large language model and generated 624 images across four AI models and three prompts. Demographic and socioeconomic attributes within the images were independently coded and thematically analyzed. Our findings show that AI models disproportionately associate ugliness with old white male figures, reflecting entrenched social biases as well as paradoxical biases, where efforts to avoid stereotypical depictions of marginalized groups inadvertently result in the disproportionate projection of negative attributes onto majority groups. Qualitative analysis further reveals that, despite supposed attempts to frame ugliness within social contexts, conventional physical markers such as asymmetry and aging persist as central visual motifs. These findings demonstrate that despite attempts to create more equal representations, generative AI continues to perpetuate inherited and paradoxical biases, underscoring the critical work being done to create ethical AI training paradigms and advance methodologies for more inclusive AI development.
Authors:Pengyu Zhu, Janghee Cho
Abstract:
Adolescents' mobile technology use is often regulated through rigid control mechanisms that fail to account for their autonomy and natural usage patterns. Drawing on Taoist philosophy, particularly Wu Wei, Yin-Yang, and Zi Ran, this position paper proposes Tao-Technology, a self-organizing, adaptive regulatory framework. Integrating insights from Reflective Informatics and Information Ecologies, we explore how mobile technology can dynamically adjust to context while fostering self-reflection and meaning-making. This approach shifts from external restrictions to dynamic co-adaptative regulation, ensuring technology governance remains flexible yet structured, supporting adolescents in cultivating a balanced and intentional relationship with digital technology.
Authors:Florian David, Michael Chan, Elenor Morgenroth, Patrik Vuilleumier, Dimitri Van De Ville
Abstract:
We propose an end-to-end deep neural encoder-decoder model to encode and decode brain activity in response to naturalistic stimuli using functional magnetic resonance imaging (fMRI) data. Leveraging temporally correlated input from consecutive film frames, we employ temporal convolutional layers in our architecture, which effectively allows to bridge the temporal resolution gap between natural movie stimuli and fMRI acquisitions. Our model predicts activity of voxels in and around the visual cortex and performs reconstruction of corresponding visual inputs from neural activity. Finally, we investigate brain regions contributing to visual decoding through saliency maps. We find that the most contributing regions are the middle occipital area, the fusiform area, and the calcarine, respectively employed in shape perception, complex recognition (in particular face perception), and basic visual features such as edges and contrasts. These functions being strongly solicited are in line with the decoder's capability to reconstruct edges, faces, and contrasts. All in all, this suggests the possibility to probe our understanding of visual processing in films using as a proxy the behaviour of deep learning models such as the one proposed in this paper.
Authors:Tadahiro Taniguchi, Masatoshi Nagano, Haruumi Omoto, Yoshiki Hayashi
Abstract:
Collective human activities like using an Ouija board (or Kokkuri-san) often produce emergent, coherent linguistic outputs unintended by any single participant. While psychological explanations such as the ideomotor effect exist, a computational understanding of how decentralized, implicit linguistic knowledge fuses through shared physical interaction remains elusive. We introduce CoCre-Sam (Collective-Creature Sampling), a framework modeling this phenomenon as collective Langevin dynamics sampling from implicitly fused language models. Each participant is represented as an agent associated with an energy landscape derived from an internal language model reflecting linguistic priors, and agents exert stochastic forces based on local energy gradients. We theoretically prove that the collective motion of the shared pointer (planchette) corresponds to Langevin MCMC sampling from the sum of individual energy landscapes, representing fused collective knowledge. Simulations validate that CoCre-Sam dynamics effectively fuse different models and generate meaningful character sequences, while ablation studies confirm the essential roles of collective interaction and stochasticity. Altogether, CoCre-Sam provides a novel computational mechanism linking individual implicit knowledge, embodied collective action, and emergent linguistic phenomena, grounding these complex interactions in the principles of probabilistic sampling.
Authors:Benjamin Watson, Alinda Friedman, Aaron McGaffey
Abstract:
This paper is a study of techniques for measuring and predicting visual fidelity. As visual stimuli we use polygonal models, and vary their fidelity with two different model simplification algorithms. We also group the stimuli into two object types: animals and man made artifacts. We examine three different experimental techniques for measuring these fidelity changes: naming times, ratings, and preferences. All the measures were sensitive to the type of simplification and level of simplification. However, the measures differed from one another in their response to object type. We also examine several automatic techniques for predicting these experimental measures, including techniques based on images and on the models themselves. Automatic measures of fidelity were successful at predicting experimental ratings, less successful at predicting preferences, and largely failures at predicting naming times. We conclude with suggestions for use and improvement of the experimental and automatic measures of visual fidelity.
Authors:Fernando Koch, Jessica Nahulan, Jeremy Fox, Martin Keen
Abstract:
Emotional cues frequently arise and shape group dynamics in interactive settings where multiple humans and artificial agents communicate through shared digital channels. While artificial agents lack intrinsic emotional states, they can simulate affective behavior using synthetic modalities such as text or speech. This work introduces a model for orchestrating emotion contagion, enabling agents to detect emotional signals, infer group mood patterns, and generate targeted emotional responses. The system captures human emotional exchanges and uses this insight to produce adaptive, generative responses that influence group affect in real time. The model supports applications in collaborative, educational, and social environments by shifting affective computing from individual-level reactions to coordinated, group-level emotion modulation. We present the system architecture and provide experimental results that illustrate its effectiveness in sensing and steering group mood dynamics.
Authors:Ze Dong, Binyang Han, Jingjing Zhang, Ruoyu Wen, Barrett Ens, Adrian Clark, Tham Piumsomboon
Abstract:
The integration of extended reality (XR) with artificial intelligence (AI) introduces a new paradigm for user interaction, enabling AI to perceive user intent, stimulate the senses, and influence decision-making. We explored the impact of four AI-driven visualisation techniques -- `Inform,' `Nudge,' `Recommend,' and `Instruct' -- on user decision-making in XR using the Meta Quest Pro. To test these techniques, we used a pre-recorded 360-degree video of a supermarket, overlaying each technique through a virtual interface. We aimed to investigate how these different visualisation techniques with different levels of user autonomy impact preferences and decision-making. An exploratory study with semi-structured interviews provided feedback and design recommendations. Our findings emphasise the importance of maintaining user autonomy, enhancing AI transparency to build trust, and considering context in visualisation design.
Authors:Juhee Bae, Benjamin Watson
Abstract:
Traditional layered graph depictions such as flow charts are in wide use. Yet as graphs grow more complex, these depictions can become difficult to understand. Quilts are matrix-based depictions for layered graphs designed to address this problem. In this research, we first improve Quilts by developing three design alternatives, and then compare the best of these alternatives to better-known node-link and matrix depictions. A primary weakness in Quilts is their depiction of skip links, links that do not simply connect to a succeeding layer. Therefore in our first study, we compare Quilts using color-only, text-only, and mixed (color and text) skip link depictions, finding that path finding with the color-only depiction is significantly slower and less accurate, and that in certain cases, the mixed depiction offers an advantage over the text-only depiction. In our second study, we compare Quilts using the mixed depiction to node-link diagrams and centered matrices. Overall results show that users can find paths through graphs significantly faster with Quilts (46.6 secs) than with node-link (58.3 secs) or matrix (71.2 secs) diagrams. This speed advantage is still greater in large graphs (e.g. in 200 node graphs, 55.4 secs vs. 71.1 secs for node-link and 84.2 secs for matrix depictions).
Authors:Hamzah Ziadeh, Hendrik Knoche
Abstract:
Research into explainable artificial intelligence (XAI) for data analysis tasks suffer from a large number of contradictions and lack of concrete design recommendations stemming from gaps in understanding the tasks that require AI assistance. In this paper, we drew on multiple fields such as visual analytics, cognition, and dashboard design to propose a method for categorising and comparing XAI studies under three dimensions: what, why, and who. We identified the main problems as: inadequate descriptions of tasks, context-free studies, and insufficient testing with target users. We propose that studies should specifically report on their users' domain, AI, and data analysis expertise to illustrate the generalisability of their findings. We also propose study guidelines for designing and reporting XAI tasks to improve the XAI community's ability to parse the rapidly growing field. We hope that our contribution can help researchers and designers better identify which studies are most relevant to their work, what gaps exist in the research, and how to handle contradictory results regarding XAI design.
Authors:Jeongone Seo, Kyung-zoon Hong, Sol Baik
Abstract:
INTRODUCTION: Older adults with early-stage dementia often retain procedural memory, enabling continued use of familiar technologies. Additionally, symbolic anchors such as photos or personalized content may serve as memory cues to reinforce digital engagement. This study explores how these mechanisms support technology use in dementia care within the South Korean context.
METHODS: We conducted in-depth interviews with 11 professional caregivers of community-dwelling older adults with cognitive decline. Grounded theory methods guided the analysis, using iterative coding and constant comparison to identify emergent themes.
RESULTS: Caregivers reported that familiar digital routines (e.g., taking photos) persisted through procedural memory. Symbolic anchors such as family photos or recognizable icons enhanced interaction and emotional engagement. However, unfamiliar or anthropomorphic technologies often triggered fear or symbolic resistance.
DISCUSSION: Findings highlight the dual role of procedural memory and symbolic anchors in sustaining digital engagement. Designing culturally responsive and cognitively accessible technologies may enhance autonomy and well-being in dementia care.
Keywords: procedural memory, symbolic anchors, dementia care, digital engagement, older adults, cultural adaptation, caregiving technologies
Authors:Zoe Kaputa, Anika Rajaram, Vryan Almanon Feliciano, Zhuoyue Lyu, Maneesh Agrawala, Hari Subramonyam
Abstract:
Programming-by-prompting with generative AI offers a new paradigm for end-user programming, shifting the focus from syntactic fluency to semantic intent. This shift holds particular promise for non-programmers such as educators, who can describe instructional goals in natural language to generate interactive learning content. Yet in bypassing direct code authoring, many of programming's core affordances - such as traceability, stepwise refinement, and behavioral testing - are lost. We propose the Chain-of-Abstractions (CoA) framework as a way to recover these affordances while preserving the expressive flexibility of natural language. CoA decomposes the synthesis process into a sequence of cognitively meaningful, task-aligned representations that function as checkpoints for specification, inspection, and refinement. We instantiate this approach in SimStep, an authoring environment for teachers that scaffolds simulation creation through four intermediate abstractions: Concept Graph, Scenario Graph, Learning Goal Graph, and UI Interaction Graph. To address ambiguities and misalignments, SimStep includes an inverse correction process that surfaces in-filled model assumptions and enables targeted revision without requiring users to manipulate code. Evaluations with educators show that CoA enables greater authoring control and interpretability in programming-by-prompting workflows.
Authors:Lo Gullstrand Heander, Emma Söderberg, Christofer Rydenfält
Abstract:
Code review is a well-established and valued practice in the software engineering community contributing to both code quality and interpersonal benefits. However, there are challenges in both tools and processes that give rise to misalignments and frustrations. Recent research seeks to address this by automating code review entirely, but we believe that this risks losing the majority of the interpersonal benefits such as knowledge transfer and shared ownership.
We believe that by better understanding the cognitive processes involved in code review, it would be possible to improve tool support, with out without AI, and make code review both more efficient, more enjoyable, while increasing or maintaining all of its benefits. In this paper, we conduct an ethnographic think-aloud study involving 10 participants and 34 code reviews. We build a cognitive model of code review bottom up through thematic, statistical, temporal, and sequential analysis of the transcribed material. Through the data, the similarities between the cognitive process in code review and decision-making processes, especially recognition-primed decision-making, become apparent.
The result is the Code Review as Decision-Making (CRDM) model that shows how the developers move through two phases during the code review; first an orientation phase to establish context and rationale and then an analytical phase to understand, assess, and plan the rest of the review. Throughout the process several decisions must be taken, on writing comments, finding more information, voting, running the code locally, verifying continuous integration results, etc.
Analysis software and process-coded data publicly available at: https://doi.org/10.5281/zenodo.15758266
Authors:Hrittika Bhowmick, Shilpaa Anand
Abstract:
Prototyping is widely regarded in Human-Computer Interaction as an iterative process through which ideas are tested and refined, often via visual mockups, screen flows, and coded simulations. This position paper critiques the visual-centric norms embedded in prototyping culture by drawing from the lived experiences of blind scholars and insights from cultural disability studies. It discusses how dominant methods of prototyping rely on an unexamined fidelity to sight, privileging what can be rendered visibly coherent while marginalizing other modes of knowing and making. By repositioning prototyping as a situated, embodied, and relational practice, this paper challenges HCI to rethink what kinds of design participation are legitimized and which are excluded when prototyping is reduced to screen-based simulations.
Authors:Andreas Pramendorfer, Rainhard Dieter Findling
Abstract:
Protecting personal computers (PCs) from unauthorized access typically relies on password authentication, which is know to suffer from cognitive burden and weak credentials. As many users nowadays carry mobile devices with advanced security features throughout their day, there is an opportunity to leverage these devices to improve authentication to PCs. In this paper we utilize a token-based passwordless approach where users authenticate to their PC by confirming the authentication request on their smartphones or smartwatches. Upon a request to login to the PC, or to evaluate privileges, the PC issues an authentication request that users receive on their mobile devices, where users can confirm or deny the request. We evaluate button tap and biometric fingerprint verification as confirmation variants, and compare their authentication duration, success rate, and usability to traditional password-based authentication in a user study with 30 participants and a total of 1,200 authentication attempts. Smartwatch-based authentication outperformed password-based authentication and smartphone-based variants in authentication duration, while showing comparable success rates. Participants rated smartwatch-based authentication highest in usability, followed by password-based authentication and smartphone-based authentication.
Authors:Mohammad Abolnejadian, Shakiba Amirshahi, Matthew Brehmer, Anamaria Crisan
Abstract:
In decision-making conversations, experts must navigate complex choices and make on-the-spot decisions while engaged in conversation. Although extensive historical data often exists, the real-time nature of these scenarios makes it infeasible for decision-makers to review and leverage relevant information. This raises an interesting question: What if experts could utilize relevant past data in real-time decision-making through insights derived from past data? To explore this, we implemented a conversational user interface, taking doctor-patient interactions as an example use case. Our system continuously listens to the conversation, identifies patient problems and doctor-suggested solutions, and retrieves related data from an embedded dataset, generating concise insights using a pipeline built around a retrieval-based Large Language Model (LLM) agent. We evaluated the prototype by embedding Health Canada datasets into a vector database and conducting simulated studies using sample doctor-patient dialogues, showing effectiveness but also challenges, setting directions for the next steps of our work.
Authors:Katherine Limes, Nathan Malkin, Kelsey R. Fulton
Abstract:
Increasingly, students begin learning aspects of security and privacy during their primary and secondary education (grades K-12 in the United States). Individual U.S. states and some national organizations publish teaching standards -- guidance that outlines expectations for what students should learn -- which often form the basis for course curricula. However, research has not yet examined what is covered by these standards and whether the topics align with what the broader security and privacy community thinks students should know. To shed light on these questions, we started by collecting computer science teaching standards from all U.S. states and eight national organizations. After manually examining a total of 11,954 standards, we labeled 3,778 of them as being related to security and privacy, further classifying these into 103 topics. Topics ranged from technical subjects like encryption, network security, and embedded systems to social subjects such as laws, ethics, and appropriate online behavior. Subsequently, we interviewed 11 security and privacy professionals to examine how the teaching standards align with their expectations. We found that, while the specific topics they mentioned mostly overlapped with those of existing standards, professionals placed a greater emphasis on threat modeling and security mindset.
Authors:Jose Gonzalez-Belmonte, Jaerock Kwon
Abstract:
As we move towards a future of autonomous vehicles, questions regarding their method of communication have arisen. One of the common questions concerns the placement of the signaling used to communicate with pedestrians and road users, but little work has been published fully dedicated to exploring this. This paper uses a simulation made in the Unity game engine to record the visibility of fifteen different vehicles, specifically regarding the visibility of frontal elements by a pedestrian on the sidewalk. Variables include the vehicle position, number of vehicles on the road, and minimum and maximum distance of the recorded points. It was concluded that the areas of the vehicle most often seen by pedestrians on the sidewalk attempting to cross the road were the frontal frontal fenders and the headlights, with the frontal wheels, frontal doors, bumper, and side mirrors are less visible alternatives. These findings are valuable in the future design of signaling for autonomous vehicles, in order to ensure pedestrians are able to see them on approaching vehicles. The software used provides a platform for similar works in the future to be conducted.
Authors:Clarice Hilton, Kat Hawkins, Phill Tew, Freddie Collins, Seb Madgwick, Dominic Potts, Tom Mitchell
Abstract:
Motion capture technologies are increasingly used in creative and performance contexts but often exclude disabled practitioners due to normative assumptions in body modeling, calibration, and avatar representation. EqualMotion introduces a body-agnostic, wearable motion capture system designed through a disability-centred co-design approach. By enabling personalised calibration, integrating mobility aids, and adopting an inclusive visual language, EqualMotion supports diverse body types and movement styles. The system is developed collaboratively with disabled researchers and creatives, aiming to foster equitable participation in digital performance and prototyping. This paper outlines the system's design principles and highlights ongoing case studies in dance and music to evaluate accessibility in real-world creative workflows.
Authors:Fernando Ayach, Vitor Lameirão, Raul Leão, Jerfferson Felizardo, Rafael Sobrinho, Vanessa Borges, PatrÃcia Matsubara, Awdren Fontão
Abstract:
Proto-personas are commonly used during early-stage Product Discovery, such as Lean Inception, to guide product definition and stakeholder alignment. However, the manual creation of proto-personas is often time-consuming, cognitively demanding, and prone to bias. In this paper, we propose and empirically investigate a prompt engineering-based approach to generate proto-personas with the support of Generative AI (GenAI). Our goal is to evaluate the approach in terms of efficiency, effectiveness, user acceptance, and the empathy elicited by the generated personas. We conducted a case study with 19 participants embedded in a real Lean Inception, employing a qualitative and quantitative methods design. The results reveal the approach's efficiency by reducing time and effort and improving the quality and reusability of personas in later discovery phases, such as Minimum Viable Product (MVP) scoping and feature refinement. While acceptance was generally high, especially regarding perceived usefulness and ease of use, participants noted limitations related to generalization and domain specificity. Furthermore, although cognitive empathy was strongly supported, affective and behavioral empathy varied significantly across participants. These results contribute novel empirical evidence on how GenAI can be effectively integrated into software Product Discovery practices, while also identifying key challenges to be addressed in future iterations of such hybrid design processes.
Authors:Sonali Sharma, Ahmed M. Alaa, Roxana Daneshjou
Abstract:
Generative AI models, including large language models (LLMs) and vision-language models (VLMs), are increasingly used to interpret medical images and answer clinical questions. Their responses often include inaccuracies; therefore, safety measures like medical disclaimers are critical to remind users that AI outputs are not professionally vetted or a substitute for medical advice. This study evaluated the presence of disclaimers in LLM and VLM outputs across model generations from 2022 to 2025. Using 500 mammograms, 500 chest X-rays, 500 dermatology images, and 500 medical questions, outputs were screened for disclaimer phrases. Medical disclaimer presence in LLM and VLM outputs dropped from 26.3% in 2022 to 0.97% in 2025, and from 19.6% in 2023 to 1.05% in 2025, respectively. By 2025, the majority of models displayed no disclaimers. As public models become more capable and authoritative, disclaimers must be implemented as a safeguard adapting to the clinical context of each output.
Authors:Yuan Li, Teja Mandaloju, Haihua Chen
Abstract:
This study investigates public perceptions of generative artificial intelligence (GenAI) in libraries through a large-scale analysis of posts on X (formerly Twitter). Using a mixed-method approach that combines temporal trend analysis, sentiment classification, and social network analysis, this paper explores how public discourse around GenAI and libraries has evolved over time, the emotional tones that dominate the conversation, and the key users or organizations driving engagement. The findings reveal that discussions are predominantly negative in tone, with surges linked to concerns about ethics and intellectual property. Furthermore, social network analysis identifies both institutional authority and individual bridge users who facilitate cross-domain engagement. The results in this paper contribute to the growing body of literature on GenAI in the library and GLAM (Galleries, Libraries, Archives, and Museums) sectors and offer a real-time, public-facing perspective on the emerging opportunities and concerns GenAI presents.
Authors:Janin Koch, Vitor Fortes Rey
Abstract:
This position paper looks at differences between the current understandings of human-centered explainability and explainability AI. We discuss current ideas in both fields, as well as the differences and opportunities we discovered. As an example of combining both, we will present preliminary work on a new algebraic machine learning approach. We are excited to continue discussing design opportunities for human-centered explainability (HCx) and xAI with the broader HCxAI community.
Authors:Mathieu Phosanarack, Laura Wallard, Sophie Lepreux, Christophe Kolski, Eugénie Avril
Abstract:
Markerless Motion Capture (MoCap) using smartphone cameras is a promising approach to making exergames more accessible and cost-effective for health and rehabilitation. Unlike traditional systems requiring specialized hardware, recent advancements in AI-powered pose estimation enable movement tracking using only a mobile device. For an upcoming study, a mobile application with real-time exergames including markerless motion capture is being developed. However, implementing such technology introduces key challenges, including balancing accuracy and real-time responsiveness, ensuring proper user interaction. Future research should explore optimizing AI models for realtime performance, integrating adaptive gamification, and refining user-centered design principles. By overcoming these challenges, smartphone-based exergames could become powerful tools for engaging users in physical activity and rehabilitation, extending their benefits to a broader audience.
Authors:Jeremy Fischer, Ram Krishnamoorthy, Vishal Kumar, Mahdi Al-Husseini
Abstract:
Medical evacuation is one of the United States Army's most storied and critical mission sets, responsible for efficiently and expediently evacuating the battlefield ill and injured. Medical evacuation planning involves designing a robust network of medical platforms and facilities capable of moving and treating large numbers of casualties. Until now, there has not been a medium to simulate these networks in a classroom setting and evaluate both offline planning and online decision-making performance. This work describes the Medical Evacuation Wargaming Initiative (MEWI), a three-dimensional multiplayer simulation developed in Unity that replicates battlefield constraints and uncertainties. MEWI accurately models patient interactions at casualty collection points, ambulance exchange points, medical treatment facilities, and evacuation platforms. Two operational scenarios are introduced: an amphibious island assault in the Pacific and a Eurasian conflict across a sprawling road and river network. These scenarios pit students against the clock to save as many casualties as possible while adhering to doctrinal lessons learned during didactic training. We visualize performance data collected from two iterations of the MEWI Pacific scenario executed in the United States Army's Medical Evacuation Doctrine Course. We consider post-wargame Likert survey data from student participants and external observer notes to identify key planning decision points, document medical evacuation lessons learned, and quantify general utility. Results indicate that MEWI participation substantially improves uptake of medical evacuation lessons learned and co-operative decision-making. MEWI is a substantial step forward in the field of high-fidelity training tools for medical education, and our study findings offer critical insights into improving medical evacuation education and operations across the joint force.
Authors:Jiapeng Yao, Lantian Zhang, Jiping Huang
Abstract:
As organizations increasingly seek to leverage machine learning (ML) capabilities, the technical complexity of implementing ML solutions creates significant barriers to adoption and impacts operational efficiency. This research examines how Large Language Models (LLMs) can transform the accessibility of ML technologies within organizations through a human-centered Automated Machine Learning (AutoML) approach. Through a comprehensive user study involving 15 professionals across various roles and technical backgrounds, we evaluate the organizational impact of an LLM-based AutoML framework compared to traditional implementation methods. Our research offers four significant contributions to both management practice and technical innovation: First, we present pioneering evidence that LLM-based interfaces can dramatically improve ML implementation success rates, with 93.34% of users achieved superior performance in the LLM condition, with 46.67% showing higher accuracy (10-25% improvement over baseline) and 46.67% demonstrating significantly higher accuracy (>25% improvement over baseline), while 6.67% maintained comparable performance levels; and 60% reporting substantially reduced development time. Second, we demonstrate how natural language interfaces can effectively bridge the technical skills gap in organizations, cutting implementation time by 50% while improving accuracy across all expertise levels. Third, we provide valuable insights for organizations designing human-AI collaborative systems, showing that our approach reduced error resolution time by 73% and significantly accelerated employee learning curves. Finally, we establish empirical support for natural language as an effective interface for complex technical systems, offering organizations a path to democratize ML capabilities without compromising quality or performance.
Authors:Andrey Titov, Tina N. H. Nantenaina, Marta Kersten-Oertel, Simon Drouin
Abstract:
Visualizing 3D medical images is challenging due to self-occlusion, where anatomical structures of interest can be obscured by surrounding tissues. Existing methods, such as slicing and interactive clipping, are limited in their ability to fully represent internal anatomy in context. In contrast, hand-drawn medical illustrations in anatomy books manage occlusion effectively by selectively removing portions based on tissue type, revealing 3D structures while preserving context. This paper introduces AnatomyCarve, a novel technique developed for a VR environment that creates high-quality illustrations similar to those in anatomy books, while remaining fast and interactive. AnatomyCarve allows users to clip selected segments from 3D medical volumes, preserving spatial relations and contextual information. This approach enhances visualization by combining advanced rendering techniques with natural user interactions in VR. Usability of AnatomyCarve was assessed through a study with non-experts, while surgical planning effectiveness was evaluated with practicing neurosurgeons and residents. The results show that AnatomyCarve enables customized anatomical visualizations, with high user satisfaction, suggesting its potential for educational and clinical applications.
Authors:Aiur Nanzatov, Lourdes Peña-Castillo, Oscar Meruvia-Pastor
Abstract:
Two-factor authentication (2FA) has become widely adopted as an efficient and secure way to validate someone's identity online. Two-factor authentication is difficult in virtual reality (VR) because users are usually wearing a head-mounted display (HMD) which does not allow them to see their real-world surroundings. We present NRXR-ID, a technique to implement two-factor authentication while using extended reality systems and smartphones. The proposed method allows users to complete an authentication challenge using their smartphones without removing their HMD. We performed a user study where we explored four types of challenges for users, including a novel checkers-style challenge. Users responded to these challenges under three different configurations, including a technique that uses the smartphone to support gaze-based selection without the use of VR controllers. A 4X3 within-subjects design allowed us to study all the variations proposed. We collected performance metrics and performed user experience questionnaires to collect subjective impressions from 30 participants. Results suggest that the checkers-style visual matching challenge was the most appropriate option, followed by entering a digital PIN challenge submitted via the smartphone and answered within the VR environment.
Authors:Kadija Bouyzourn, Alexandra Birch
Abstract:
This mixed-methods inquiry examined four domains that shape university students' trust in ChatGPT: user attributes, seven delineated trust dimensions, task context, and perceived societal impact. Data were collected through a survey of 115 UK undergraduate and postgraduate students and four complementary semi-structured interviews. Behavioural engagement outweighed demographics: frequent use increased trust, whereas self-reported understanding of large-language-model mechanics reduced it. Among the dimensions, perceived expertise and ethical risk were the strongest predictors of overall trust; ease of use and transparency had secondary effects, while human-likeness and reputation were non-significant. Trust was highly task-contingent; highest for coding and summarising, lowest for entertainment and citation generation, yet confidence in ChatGPT's referencing ability, despite known inaccuracies, was the single strongest correlate of global trust, indicating automation bias. Computer-science students surpassed peers only in trusting the system for proofreading and writing, suggesting technical expertise refines rather than inflates reliance. Finally, students who viewed AI's societal impact positively reported the greatest trust, whereas mixed or negative outlooks dampened confidence. These findings show that trust in ChatGPT hinges on task verifiability, perceived competence, ethical alignment and direct experience, and they underscore the need for transparency, accuracy cues and user education when deploying LLMs in academic settings.
Authors:Celeste Campos-Castillo, Xuan Kang, Linnea I. Laestadius
Abstract:
Recently, research into chatbots (also known as conversational agents, AI agents, voice assistants), which are computer applications using artificial intelligence to mimic human-like conversation, has grown sharply. Despite this growth, sociology lags other disciplines (including computer science, medicine, psychology, and communication) in publishing about chatbots. We suggest sociology can advance understanding of human-chatbot interaction and offer four sociological theories to enhance extant work in this field. The first two theories (resource substitution theory, power-dependence theory) add new insights to existing models of the drivers of chatbot use, which overlook sociological concerns about how social structure (e.g., systemic discrimination, the uneven distribution of resources within networks) inclines individuals to use chatbots, including problematic levels of emotional dependency on chatbots. The second two theories (affect control theory, fundamental cause of disease theory) help inform the development of chatbot-driven interventions that minimize safety risks and enhance equity by leveraging sociological insights into how chatbot outputs could attend to cultural contexts (e.g., affective norms) to promote wellbeing and enhance communities (e.g., opportunities for civic participation). We discuss the value of applying sociological theories for advancing theorizing about human-chatbot interaction and developing chatbots for social good.
Authors:Greg Nyilasy, Harsha Gangadharbatla
Abstract:
As AI hype continues to grow, organizations face pressure to broadcast or downplay purported AI initiatives - even when contrary to truth. This paper introduces AI-washing as overstating (deceptive boasting) or understating (deceptive denial) a company's real AI usage. A 2x2 experiment (N = 401) examines how these false claims affect consumer attitudes and purchase intentions. Results reveal a pronounced asymmetry: deceptive denial evokes more negative moral judgments than honest negation, while deceptive boasting has no effects. We show that perceived betrayal mediates these outcomes. By clarifying how AI-washing erodes trust, the study highlights clear ethical implications for policymakers, marketers, and researchers striving for transparency.
Authors:Yichen Yu, Huan-Song Xu
Abstract:
Visually impaired individuals often require a guide runner to safely participate in outdoor running. However, maintaining synchronized pacing with verbal cues or tethers can be mentally taxing and physically restrictive. Existing solutions primarily focus on navigation or obstacle avoidance but overlook the importance of real-time interpersonal rhythm coordination during running. We introduce RunPacer, a smartwatch-based vibrotactile feedback system that delivers synchronized rhythmic pulses to both runners. In contrast to conventional guide-running systems that rely heavily on continuous verbal communication or mechanical tethering, RunPacer emphasizes interpersonal cadence alignment as its core interaction model. By pre-setting a target step frequency or dynamically adapting to the guide's natural pace, the system ensures that both runners receive identical haptic cues, enabling them to maintain coordinated motion intuitively and efficiently. This poster presents the system architecture, positions it within prior research on haptic entrainment, and outlines the vision for future field deployment, including potential multimodal feedback extensions. RunPacer contributes a lightweight, socially cooperative, and non-visual assistive framework that reimagines co-running as a shared, embodied, and accessible experience.
Authors:Sirina HÃ¥land, Trond Karlsen Strøm, Petra GaluÅ¡Äáková
Abstract:
Although the amount of available spoken content is steadily increasing, extracting information and knowledge from speech recordings remains challenging. Beyond enhancing traditional information retrieval methods such as speech search and keyword spotting, novel approaches for navigating and searching spoken content need to be explored and developed. In this paper, we propose a novel navigational method for speech archives that leverages recent advances in language and multimodal generative models. We demonstrate our approach with a Web application that organizes data into a structured format using interactive mind maps and image generation tools. The system is implemented using the TED-LIUM~3 dataset, which comprises over 2,000 speech transcripts and audio files of TED Talks. Initial user tests using a System Usability Scale (SUS) questionnaire indicate the application's potential to simplify the exploration of large speech collections.
Authors:Dani Paul Hove, Benjamin Watson
Abstract:
Video conferencing has become a central part of our daily lives, thanks to the COVID-19 pandemic. Unfortunately, so have its many limitations, resulting in poor support for communicative and social behavior and ultimately, Zoom fatigue. New technologies will be required to address these limitations, including many drawn from mixed reality (XR). In this paper, our goals are to equip and encourage future researchers to develop and test such technologies. Toward this end, we first survey research on the shortcomings of video conferencing systems, as defined before and after the pandemic. We then consider the methods that research uses to evaluate support for communicative behavior, and argue that those same methods should be employed in identifying, improving, and validating promising video conferencing technologies. Next, we survey emerging XR solutions to video conferencing's limitations, most off which do not employ head-mounted displays.
Authors:Ronald J. Pandolfi, Jeffrey J. Donatelli, Julian Todd, Daniela Ushizima
Abstract:
ASCRIBE-XR, a novel computational platform designed to facilitate the visualization and exploration of 3D volumetric data and mesh data in the context of synchrotron experiments, is described. Using Godot and PC-VR technologies, the platform enables users to dynamically load and manipulate 3D data sets to gain deeper insights into their research. The program's multi-user capabilities, enabled through WebRTC, and MQTT, allow multiple users to share data and visualize together in real-time, promoting a more interactive and engaging research experience. We describe the design and implementation of ASCRIBE-XR, highlighting its key features and capabilities. We will also discuss its utility in the context of synchrotron research, including examples of its application and potential benefits for the scientific community.
Authors:Amr Mohamed, Maram Assi, Mariam Guizani
Abstract:
Large language model assistants (LLM-assistants) present new opportunities to transform software development. Developers are increasingly adopting these tools across tasks, including coding, testing, debugging, documentation, and design. Yet, despite growing interest, there is no synthesis of how LLM-assistants affect software developer productivity. In this paper, we present a systematic literature review of 37 peer-reviewed studies published between January 2014 and December 2024 that examine this impact. Our analysis reveals that LLM-assistants offer both considerable benefits and critical risks. Commonly reported gains include minimized code search, accelerated development, and the automation of trivial and repetitive tasks. However, studies also highlight concerns around cognitive offloading, reduced team collaboration, and inconsistent effects on code quality. While the majority of studies (92%) adopt a multi-dimensional perspective by examining at least two SPACE dimensions, reflecting increased awareness of the complexity of developer productivity, only 14% extend beyond three dimensions, indicating substantial room for more integrated evaluations. Satisfaction, Performance, and Efficiency are the most frequently investigated dimensions, whereas Communication and Activity remain underexplored. Most studies are exploratory (64%) and methodologically diverse, but lack longitudinal and team-based evaluations. This review surfaces key research gaps and provides recommendations for future research and practice. All artifacts associated with this study are publicly available at https://zenodo.org/records/15788502.
Authors:Don Roosan, Tiffany Khao, Huong Phan, Yan Li
Abstract:
Noncompliance with medication regimens poses an immense challenge in the management of chronic diseases, often resulting in exacerbated health complications and recurrent hospital admissions. Addressing this gap, our team designed an innovative mobile game aimed at bolstering medication adherence and information retention within the general population. Employing Amazon Mechanical Turk, participants were enlisted and allocated into two cohorts: one engaged with our mobile game and the other perused an informational pamphlet about medication. Both cohorts underwent a pre-intervention quiz, followed by their respective interventions, and concluded with a post-intervention quiz. Primary outcome measures included the difference in quiz scores and the game play duration. The investigation encompassed 243 participants with homogenous baseline attributes. Participants interacting with the mobile game depicted a significant enhancement in their post-intervention scores compared to the pre-intervention scores. We observed a notable correlation of 0.346 (p<0.001) with a robust medium effect size of 0.641 (0.503 - 0.779). Although the duration of game play and post-intervention scores didn't exhibit a direct correlation, a tendency towards superior post-intervention scores was evident among participants who dedicated more time to the game. The interactive mobile game we developed exhibits potential as an engaging instrument for empowering patients and caregivers. Providing critical medication information and the potential side effects in a manner that increases retention would thereby mitigate medication noncompliance. Future research endeavors should focus on optimizing and broadening the application of such mobile interfaces to fortify public health initiatives.
Authors:Keita Kiuchi, Yoshikazu Fujimoto, Hideyuki Goto, Tomonori Hosokawa, Makoto Nishimura, Yosuke Sato, Izumi Sezai
Abstract:
This study provides the first comprehensive evaluation of large language model (LLM) performance across three counseling roles in Japanese-language therapeutic contexts. We simultaneously assessed counselor artificial intelligence (AI) systems (GPT-4-turbo with zeroshot prompting or Structured Multi-step Dialogue Prompts (SMDP), Claude-3-Opus-SMDP), client AI simulations, and evaluation AI systems (o3, Claude-3.7-Sonnet, Gemini-2.5-pro). Human experts (n = 15) with extensive counseling experience evaluated AI-generated dialogues using the Motivational Interviewing Treatment Integrity (MITI) Coding Manual 4.2.1.
Notably, SMDP implementation significantly enhanced counselor AI performance across all MITI global ratings compared with zeroshot prompting, with no significant differences between GPT-SMDP and Opus-SMDP. Evaluation AIs showed comparable performance to human raters for Cultivating Change Talk but systematically overestimated Softening Sustain Talk and the overall quality metrics. Model-specific biases emerged: Gemini emphasized power-sharing, o3 focused on technical proficiency, and Sonnet prioritized emotional expression. Client AI simulations exhibited a limited emotional range and unnaturally high compliance, indicating the need for enhanced realism.
These findings establish benchmarks for AI-assisted counseling in non-English contexts and identify critical areas for improvement through advanced prompt engineering, retrieval-augmented generation, and targeted fine-tuning, with important implications for developing culturally sensitive AI mental health tools.
Authors:Steve Devènes, Marine Capallera, Robin Cherix, Elena Mugellini, Omar Abou Khaled, Francesco Carrino
Abstract:
The loss of knowledge when skilled operators leave poses a critical issue for companies. This know-how is diverse and unstructured. We propose a novel method that combines knowledge graph embeddings and multi-modal interfaces to collect and retrieve expertise, making it actionable. Our approach supports decision-making on the shop floor. Additionally, we leverage LLMs to improve query understanding and provide adapted answers. As application case studies, we developed a proof-of-concept for quality control in high precision manufacturing.
Authors:Andrew Schwabe, Ãzgür Akgün, Ella Haig
Abstract:
Many e-learning platforms assert their ability or potential to improve students' self-regulated learning (SRL), however the cyclical and undirected nature of SRL theoretical models represent significant challenges for representation within contemporary machine learning frameworks. We apply SRL-informed features to trace data in order to advance modelling of students' SRL activities, to improve predictability and explainability regarding the causal effects of learning in an eLearning environment. We demonstrate that these features improve predictive accuracy and validate the value of further research into cyclic modelling techniques for SRL.
Authors:Dinuo Liao, James Derek Lomas, Cehao Yu
Abstract:
This study explores how Low-Rank Adaptation (LoRA) fine-tuning, guided by human aesthetic evaluations, can enhance the outputs of generative AI models in tangible product design, using lamp design as a case study. By integrating human feedback into the AI model, we aim to improve both the desirability and aesthetic appeal of the generated designs. Comprehensive experiments were conducted, starting with prompt optimization techniques and focusing on LoRA fine-tuning of the Stable Diffusion model. Additionally, methods to convert AI-generated designs into tangible products through 3D realization using 3D printing technologies were investigated. The results indicate that LoRA fine-tuning effectively aligns AI-generated designs with human aesthetic preferences, leading to significant improvements in desirability and aesthetic appeal scores. These findings highlight the potential of human-AI collaboration in tangible product design and provide valuable insights into integrating human feedback into AI design processes.
Authors:Ahmed G. Habashi, Ahmed M. Azab, Seif Eldawlatly, Gamal M. Aly
Abstract:
Cross-subject motor imagery (CS-MI) classification in brain-computer interfaces (BCIs) is a challenging task due to the significant variability in Electroencephalography (EEG) patterns across different individuals. This variability often results in lower classification accuracy compared to subject-specific models, presenting a major barrier to developing calibration-free BCIs suitable for real-world applications. In this paper, we introduce a novel approach that significantly enhances cross-subject MI classification performance through optimized preprocessing and deep learning techniques. Our approach involves direct classification of Short-Time Fourier Transform (STFT)-transformed EEG data, optimized STFT parameters, and a balanced batching strategy during training of a Convolutional Neural Network (CNN). This approach is uniquely validated across four different datasets, including three widely-used benchmark datasets leading to substantial improvements in cross-subject classification, achieving 67.60% on the BCI Competition IV Dataset 1 (IV-1), 65.96% on Dataset 2A (IV-2A), and 80.22% on Dataset 2B (IV-2B), outperforming state-of-the-art techniques. Additionally, we systematically investigate the classification performance using MI windows ranging from the full 4-second window to 1-second windows. These results establish a new benchmark for generalizable, calibration-free MI classification in addition to contributing a robust open-access dataset to advance research in this domain.
Authors:Hao Tang, Songyun Xie, Xinzhou Xie, Can Liao, Xin Zhang, Bohan Li, Zhongyu Tian, Dalu Zheng
Abstract:
Traditional video-induced emotion physiological datasets often use whole-trial annotation, assigning a single emotion label to all data collected during an entire trial. This coarse-grained annotation approach misaligns with the dynamic and temporally localized nature of emotional responses as they unfold with video narratives, introducing label noise that limits emotion recognition algorithm evaluation and performance. To solve the label noise problem caused by coarse-grained annotation, we propose a fine-grained annotation method through an immediate recall paradigm. This paradigm integrates an immediate video replay phase after the initial stimulus viewing, allowing participants to precisely mark the onset timestamp, emotion label, and intensity based on their immediate recall. We validate this paradigm through physiological evidence and recognition performance. Physiological validation of multimodal signals within participant-marked windows revealed rhythm-specific EEG patterns and arousal-dependent GSR responses-with SCRs appearing in 91% of high-arousal versus 6% of low-arousal emotion windows. These objective physiological data changes strongly aligned with subjective annotations, confirming annotation precision. For recognition performance, classification experiments showed that models trained on fine-grained annotations achieved 9.7% higher accuracy than traditional whole-trial labeling, despite using less data. This work not only addresses label noise through fine-grained annotation but also demonstrates that annotation precision outweighs data scale in determining emotion recognition performance.
Authors:Tim Rogers, Ben Teehankee
Abstract:
This paper examines a critical yet unexplored dimension of the AI alignment problem: the potential for Large Language Models (LLMs) to inherit and amplify existing misalignments between human espoused theories and theories-in-use. Drawing on action science research, we argue that LLMs trained on human-generated text likely absorb and reproduce Model 1 theories-in-use - a defensive reasoning pattern that both inhibits learning and creates ongoing anti-learning dynamics at the dyad, group, and organisational levels. Through a detailed case study of an LLM acting as an HR consultant, we show how its advice, while superficially professional, systematically reinforces unproductive problem-solving approaches and blocks pathways to deeper organisational learning. This represents a specific instance of the alignment problem where the AI system successfully mirrors human behaviour but inherits our cognitive blind spots. This poses particular risks if LLMs are integrated into organisational decision-making processes, potentially entrenching anti-learning practices while lending authority to them. The paper concludes by exploring the possibility of developing LLMs capable of facilitating Model 2 learning - a more productive theory-in-use - and suggests this effort could advance both AI alignment research and action science practice. This analysis reveals an unexpected symmetry in the alignment challenge: the process of developing AI systems properly aligned with human values could yield tools that help humans themselves better embody those same values.
Authors:Nathan Kawamoto, Daniel Hoover, Jonathan Xie, Jacob Walters, Katie Snyder, Aditi Verma
Abstract:
As fusion energy technologies approach demonstration and commercial deployment, understanding public perspectives on future fusion facilities will be critical for achieving social license, especially because fusion energy facilities, unlike large fission reactors, may be sited in closer proximity to people and communities, due to distinct regulatory frameworks. In a departure from the 'decide-announce-defend' approach typically used to site energy infrastructure, we develop a participatory design methodology for collaboratively designing fusion energy facilities with prospective host communities. We present here our findings from a participatory design workshop that brought together 22 community participants and 34 engineering students. Our analysis of the textual and visual data from this workshop shows a range of design values and decision-making criteria with 'integrity' and 'respect' ranking highest among values and 'economic benefits' and 'environmental protection/safety' ranking highest among decision-making criteria. Salient design themes that emerge across facility concepts include connecting the history and legacy of the community to the design of the facility, care for workers, transparency and access to the facility, and health and safety of the host community. Participants reported predominantly positive sentiments, expressing joy and surprise as the workshop progressed from learning about fusion to designing the hypothetical facility. Our findings suggest that carrying out participatory design in the early stages of technology development can invite and make concrete public hopes and concerns, improve understanding of, and curiosity about, an emerging technology, build toward social license, and inform context-specific development of fusion energy facilities.
Authors:Shan Li, Guozhu Ding
Abstract:
This study introduces and evaluates Healthy Choice, an innovative theory-driven and AI-enhanced simulation platform designed to cultivate nutrition literacy through interactive scenario-based learning experiences. We collected feedback from 114 university students with diverse backgrounds who completed simulated product selection scenarios. Quantitative ratings of usefulness and ease of use demonstrated high user satisfaction.
Authors:Chris Duckworth, Zlatko Zlatev, James Sciberras, Peter Hallett, Enrico Gerding
Abstract:
Purpose: Financial service companies manage huge volumes of data which requires timely error identification and resolution. The associated tasks to resolve these errors frequently put financial analyst workforces under significant pressure leading to resourcing challenges and increased business risk. To address this challenge, we introduce a formal task allocation model which considers both business orientated goals and analyst well-being.
Methodology: We use a Genetic Algorithm (GA) to optimise our formal model to allocate and schedule tasks to analysts. The proposed solution is able to allocate tasks to analysts with appropriate skills and experience, while taking into account staff well-being objectives.
Findings: We demonstrate our GA model outperforms baseline heuristics, current working practice, and is applicable to a range of single and multi-objective real-world scenarios. We discuss the potential for metaheuristics (such as GAs) to efficiently find sufficiently good allocations which can provide recommendations for financial service managers in-the-loop.
Originality: A key gap in existing allocation and scheduling models, is fully considering worker well-being. This paper presents an allocation model which explicitly optimises for well-being while still improving on current working practice for efficiency.
Authors:Sanjay Krishna Anbalagan, Xinrui Nie, Umesh Mohan, Vijay Kumar Kanamarlapudi, Anughna Kommalapati, Xiaodan Zhao
Abstract:
Domain specific chatbot applications often involve multi step interactions, such as refining search filters, selecting multiple items, or performing comparisons. Traditional graphical user interfaces (GUIs) handle these workflows by providing explicit "Submit" (commit data) and "Reset" (discard data) actions, allowing back-end systems to track user intent unambiguously. In contrast, conversational agents rely on subtle language cues, which can lead to confusion and incomplete context management. This paper proposes modeling these GUI inspired metaphors acknowledgment (submit like) and context switching (reset-like) as explicit tasks within large language model (LLM) prompts. By capturing user acknowledgment, reset actions, and chain of thought (CoT) reasoning as structured session data, we preserve clarity, reduce user confusion, and align domain-specific chatbot interactions with back-end logic. We demonstrate our approach in hotel booking and customer management scenarios, highlighting improvements in multi-turn task coherence, user satisfaction, and efficiency.
Authors:Beatriz Severes, Ana O. Henriques, Rory Clark, Paulo Bala, Anna Carter, Rua Mae Williams, Geraldine Fitzpatrick
Abstract:
Academic well-being is deeply influenced by peer-support networks, yet they remain informal, inequitable, and unsustainable, often relying on personal connections and social capital rather than structured, inclusive systems. Additionally, institutional well-being responses frequently focus on student populations, neglecting the emotional labour of faculty and staff, reinforcing an exclusionary academic culture. Drawing on HCI methodologies, participatory design, and care ethics, this workshop will provide a space for rethinking how academic communities can support inclusive networks. Through pre-workshop engagement, co-design activities, and reflection, participants will examine systemic gaps in networks and explore ways to embed care, equity, and sustainability into academic peer-support frameworks -- from informal, exclusionary models to structured, inclusive care-based ecosystems. At the end of the workshop, participants will co-develop design strategies for integrating care and resilience in academic ecosystems, resources for designing equitable support systems, and a peer network invested and committed to fostering a supportive academic community.
Authors:Wen Zhan, Ziqun Hua, Peiyue Lin, Yunfei Chen
Abstract:
This paper explores how older adults, particularly aging migrants in urban China, can engage AI-assisted co-creation to express personal narratives that are often fragmented, underrepresented, or difficult to verbalize. Through a pilot workshop combining oral storytelling and the symbolic reconstruction of Hanzi, participants shared memories of migration and recreated new character forms using Xiaozhuan glyphs, suggested by the Large Language Model (LLM), together with physical materials. Supported by human facilitation and a soft AI presence, participants transformed lived experience into visual and tactile expressions without requiring digital literacy. This approach offers new perspectives on human-AI collaboration and aging by repositioning AI not as a content producer but as a supportive mechanism, and by supporting narrative agency within sociotechnical systems.
Authors:Yoonseok Yang, Minjune Kim, Marlon Rondinelli, Keren Shao
Abstract:
Grading handwritten, open-ended responses remains a major bottleneck in large university STEM courses. We introduce Pensieve (https://www.pensieve.co), an AI-assisted grading platform that leverages large language models (LLMs) to transcribe and evaluate student work, providing instructors with rubric-aligned scores, transcriptions, and confidence ratings. Unlike prior tools that focus narrowly on specific tasks like transcription or rubric generation, Pensieve supports the entire grading pipeline-from scanned student submissions to final feedback-within a human-in-the-loop interface.
Pensieve has been deployed in real-world courses at over 20 institutions and has graded more than 300,000 student responses. We present system details and empirical results across four core STEM disciplines: Computer Science, Mathematics, Physics, and Chemistry. Our findings show that Pensieve reduces grading time by an average of 65%, while maintaining a 95.4% agreement rate with instructor-assigned grades for high-confidence predictions.
Authors:Matthew JY Kang, Wenli Yang, Monica R Roberts, Byeong Ho Kang, Charles B Malpas
Abstract:
The recent boom of large language models (LLMs) has re-ignited the hope that artificial intelligence (AI) systems could aid medical diagnosis. Yet despite dazzling benchmark scores, LLM assistants have yet to deliver measurable improvements at the bedside. This scoping review aims to highlight the areas where AI is limited to make practical contributions in the clinical setting, specifically in dementia diagnosis and care.
Standalone machine-learning models excel at pattern recognition but seldom provide actionable, interpretable guidance, eroding clinician trust. Adjacent use of LLMs by physicians did not result in better diagnostic accuracy or speed. Key limitations trace to the data-driven paradigm: black-box outputs which lack transparency, vulnerability to hallucinations, and weak causal reasoning. Hybrid approaches that combine statistical learning with expert rule-based knowledge, and involve clinicians throughout the process help bring back interpretability. They also fit better with existing clinical workflows, as seen in examples like PEIRS and ATHENA-CDS.
Future decision-support should prioritise explanatory coherence by linking predictions to clinically meaningful causes. This can be done through neuro-symbolic or hybrid AI that combines the language ability of LLMs with human causal expertise. AI researchers have addressed this direction, with explainable AI and neuro-symbolic AI being the next logical steps in further advancement in AI. However, they are still based on data-driven knowledge integration instead of human-in-the-loop approaches. Future research should measure success not only by accuracy but by improvements in clinician understanding, workflow fit, and patient outcomes. A better understanding of what helps improve human-computer interactions is greatly needed for AI systems to become part of clinical practice.
Authors:Paul C. Parsons, Arran Ridley
Abstract:
Professional visualization design has become an increasingly important area of inquiry, yet much of the field's discourse remains anchored in researcher-centered contexts. Studies of design practice often focus on individual designers' decisions and reflections, offering limited insight into the collaborative and systemic dimensions of professional work. In this paper, we propose a systems-level reframing of design judgment grounded in the coordination and adaptation that sustain progress amid uncertainty, constraint, and misalignment. Drawing on sustained engagement across multiple empirical studies--including ethnographic observation of design teams and qualitative studies of individual practitioners--we identify recurring episodes in which coherence was preserved not by selecting an optimal option, but by repairing alignment, adjusting plans, and reframing goals. We interpret these dynamics through the lens of Joint Cognitive Systems, which provide tools for analyzing how judgment emerges as a distributed capacity within sociotechnical activity. This perspective surfaces often-invisible work in visualization design and offers researchers a new conceptual vocabulary for studying how design activity is sustained in practice.
Authors:Yeonbin Son, Matthew L. Bolton
Abstract:
There is growing interest in explainable recommender systems that provide recommendations along with explanations for the reasoning behind them. When evaluating recommender systems, most studies focus on overall recommendation performance. Only a few assess the quality of the explanations. Explanation quality is often evaluated through user studies that subjectively gather users' opinions on representative explanatory factors that shape end-users' perspective towards the results, not about the explanation contents itself. We aim to fill this gap by developing an objective metric to evaluate Veracity: the information quality of explanations. Specifically, we decompose Veracity into two dimensions: Fidelity and Attunement. Fidelity refers to whether the explanation includes accurate information about the recommended item. Attunement evaluates whether the explanation reflects the target user's preferences. By applying signal detection theory, we first determine decision outcomes for each dimension and then combine them to calculate a sensitivity, which serves as the final Veracity value. To assess the effectiveness of the proposed metric, we set up four cases with varying levels of information quality to validate whether our metric can accurately capture differences in quality. The results provided meaningful insights into the effectiveness of our proposed metric.
Authors:Braden Roper, William Thompson, Chris Weaver
Abstract:
Game-Based Learning has proven to be an effective method for enhancing engagement with educational material. However, gaining a deeper understanding of player strategies remains challenging. Sequential game-state and action-based tracking tools often gather extensive data that can be difficult to interpret as long-term strategy. This data presents unique problems to visualization, as it can be fairly natural, noisy data but is constrained within synthetic, controlled environments, leading to issues such as overplotting which can make interpretation complicated. We propose an animated visual encoding tool that utilizes kinetic visualization to address these issues. This tool enables researchers to construct animated data narratives through the configuration of parameter interpolation curves and blending layers. Finally, we demonstrate the usefulness of the tool while addressing specific interests as outlined by a domain expert collaborator.
Authors:Haosen Xing, Haoran Ma, Sijin Zhang, Hartmut Geyer
Abstract:
Current control strategies for powered lower limb prostheses often lack awareness of the environment and the user's intended interactions with it. This limitation becomes particularly apparent in complex terrains. Obstacle negotiation, a critical scenario exemplifying such challenges, requires both real-time perception of obstacle geometry and responsiveness to user intention about when and where to step over or onto, to dynamically adjust swing trajectories. We propose a novel control strategy that fuses environmental awareness and human cooperativeness: an on-board depth camera detects obstacles ahead of swing phase, prompting an elevated early-swing trajectory to ensure clearance, while late-swing control defers to natural biomechanical cues from the user. This approach enables intuitive stepping strategies without requiring unnatural movement patterns. Experiments with three non-amputee participants demonstrated 100 percent success across more than 150 step-overs and 30 step-ons with randomly placed obstacles of varying heights (4-16 cm) and distances (15-70 cm). By effectively addressing obstacle navigation -- a gateway challenge for complex terrain mobility -- our system demonstrates adaptability to both environmental constraints and user intentions, with promising applications across diverse locomotion scenarios.
Authors:Megan T. deBettencourt, Sruthi Sakthivel, Emily A. Holmes, Mark Chevillet
Abstract:
Trauma prevalence is vast globally. Evidence-based digital treatments can help, but most require human guidance. Human guides provide tailored instructions and responsiveness to internal cognitive states, but limit scalability. Can generative AI and neurotechnology provide a scalable alternative? Here we test ANTIDOTE, combining AI guidance and pupillometry to automatically deliver and monitor an evidence-based digital treatment, specifically the Imagery Competing Task Intervention (ICTI), to reduce intrusive memories after psychological trauma. One hundred healthy volunteers were exposed to videos of traumatic events and randomly assigned to an intervention or active control condition. As predicted, intervention participants reported significantly fewer intrusive memories over the following week. Post-hoc assessment against clinical rubrics confirmed the AI guide delivered the intervention successfully. Additionally, pupil size tracked intervention engagement and predicted symptom reduction, providing a candidate biomarker of intervention effectiveness. These findings open a path toward rigorous AI-guided digital interventions that can scale to trauma prevalence.
Authors:Fan Wang, Giulia Perugia, Yuan Feng, Wijnand IJsselsteijn
Abstract:
As social robots increasingly enter dementia care, concerns about deception, intentional or not, are gaining attention. Yet, how robotic design cues might elicit misleading perceptions in people with dementia, and how these perceptions arise, remains insufficiently understood. In this scoping review, we examined 26 empirical studies on interactions between people with dementia and physical social robots. We identify four key design cue categories that may influence deceptive impressions: cues resembling physiological signs (e.g., simulated breathing), social intentions (e.g., playful movement), familiar beings (e.g., animal-like form and sound), and, to a lesser extent, cues that reveal artificiality. Thematic analysis of user responses reveals that people with dementia often attribute biological, social, and mental capacities to robots, dynamically shifting between awareness and illusion. These findings underscore the fluctuating nature of ontological perception in dementia contexts. Existing definitions of robotic deception often rest on philosophical or behaviorist premises, but rarely engage with the cognitive mechanisms involved. We propose an empirically grounded definition: robotic deception occurs when Type 1 (automatic, heuristic) processing dominates over Type 2 (deliberative, analytic) reasoning, leading to misinterpretation of a robot's artificial nature. This dual-process perspective highlights the ethical complexity of social robots in dementia care and calls for design approaches that are not only engaging, but also epistemically respectful.
Authors:Philipp M. Zähl, Sabine Theis, Martin R. Wolf
Abstract:
Although software engineering research has focused on optimizing processes and technology, there is a growing recognition that human factors, particularly teamwork, also significantly impact optimization. Recent research suggests that developer personality has a strong influence on teamwork. In fact, personality considerations may have a greater impact on software development than processes and tools. This paper aims to design a study that measures the impact of HEXACO personality traits on the Teamwork Quality (TWQ) of software teams. A preliminary data collection (n=54) was conducted for this purpose. The analysis showed that several personality traits, as well as their composition, had a significant impact on TWQ. Additionally, other variables, such as the proportion of women and age distribution, also affected TWQ. The study's initial results demonstrate the usefulness and validity of the study design. The results also suggest several opportunities to improve teamwork in IT organizations and avenues for further research.
Authors:Deland Liu, Frigyes Samuel Racz, Zoe Lalji, Jose del R. Millan
Abstract:
Patients with amyotrophic lateral sclerosis (ALS) in the completely locked-in state (CLIS) can lose all reliable motor control and are left without any means of communication. It remains unknown whether non-invasive electroencephalogram (EEG) based brain-computer interfaces (BCIs) can support volitional communication in CLIS. Here, we show that a CLIS patient was able to operate an EEG-based BCI across multiple online sessions to respond to both general knowledge and personally relevant assistive questions. The patient delivered "Yes"/"No" responses by volitionally modulating alpha and beta band power at different channels, guided by real-time auditory feedback from the BCI. The patient communicated assistive needs above chance in all sessions, achieving a perfect score in the final session. Performance on general knowledge questions varied across sessions, with two sessions showing accurate and above-chance responses, while the first and last sessions remained at chance level. The patient also showed consistent modulation patterns over time. These findings suggest that non-invasive BCIs may offer a potential pathway for restoring basic communication in CLIS.
Authors:Zhuochao Peng, Jiaxin Xu, Jun Hu, Haian Xue, Laurens A. G. Kolks, Pieter M. A. Desmet
Abstract:
While recent research highlights the potential of social robots to support mood regulation, little is known about how prospective users view their integration into everyday life. To explore this, we conducted an exploratory case study that used a speculative robot concept "Mora" to provoke reflection and facilitate meaningful discussion about using social robots to manage subtle, day-to-day emotional experiences. We focused on the "Sunday Blues," a common dip in mood that occurs at the end of the weekend, as a relatable context in which to explore individuals' insights. Using a video prototype and a co-constructing stories method, we engaged 15 participants in imagining interactions with Mora and discussing their expectations, doubts, and concerns. The study surfaced a range of nuanced reflections around the attributes of social robots like empathy, intervention effectiveness, and ethical boundaries, which we translated into design considerations for future research and development in human-robot interaction.
Authors:Christopher M. Wegemer, Edward Halim, Jeff Burke
Abstract:
Political polarization undermines democratic civic education by exacerbating identity-based resistance to opposing viewpoints. Emerging AI technologies offer new opportunities to advance interventions that reduce polarization and promote political open-mindedness. We examined novel design strategies that leverage adaptive and emotionally-responsive civic narratives that may sustain students' emotional engagement in stories, and in turn, promote perspective-taking toward members of political out-groups. Drawing on theories from political psychology and narratology, we investigate how affective computing techniques can support three storytelling mechanisms: transportation into a story world, identification with characters, and interaction with the storyteller. Using a design-based research (DBR) approach, we iteratively developed and refined an AI-mediated Digital Civic Storytelling (AI-DCS) platform. Our prototype integrates facial emotion recognition and attention tracking to assess users' affective and attentional states in real time. Narrative content is organized around pre-structured story outlines, with beat-by-beat language adaptation implemented via GPT-4, personalizing linguistic tone to sustain students' emotional engagement in stories that center political perspectives different from their own. Our work offers a foundation for AI-supported, emotionally-sensitive strategies that address affective polarization while preserving learner autonomy. We conclude with implications for civic education interventions, algorithmic literacy, and HCI challenges associated with AI dialogue management and affect-adaptive learning environments.
Authors:Varsha Pendyala, Pedro Morgado, William Sethares
Abstract:
Voice interfaces integral to the human-computer interaction systems can benefit from speech emotion recognition (SER) to customize responses based on user emotions. Since humans convey emotions through multi-modal audio-visual cues, developing SER systems using both the modalities is beneficial. However, collecting a vast amount of labeled data for their development is expensive. This paper proposes a knowledge distillation framework called LightweightSER (LiSER) that leverages unlabeled audio-visual data for SER, using large teacher models built on advanced speech and face representation models. LiSER transfers knowledge regarding speech emotions and facial expressions from the teacher models to lightweight student models. Experiments conducted on two benchmark datasets, RAVDESS and CREMA-D, demonstrate that LiSER can reduce the dependence on extensive labeled datasets for SER tasks.
Authors:Patricia Piedade
Abstract:
This is the Proceedings of the Access InContext Workshop, which was held at the CHI'25 Conference on Human Factors in Computing Systems, in Yokohama, Japan, on April 26th 2025.
Authors:Paul Haimes
Abstract:
This paper introduces Chord Colourizer, a near real-time system that detects the musical key of an audio signal and visually represents it through a novel graphical user interface (GUI). The system assigns colours to musical notes based on Isaac Newton's original colour wheel, preserving historical links between pitch and hue, and also integrates an Arduino-controlled LED display using 3D-printed star-shaped diffusers to offer a physical ambient media representation. The method employs Constant-Q Transform (CQT) chroma features for chord estimation and visualization, followed by threshold-based filtering and tonal enhancement to isolate the root, third, and fifth. A confidence score is computed for each detection to ensure reliability, and only chords with moderate to very strong certainty are visualized. The graphical interface dynamically updates a colour-coded keyboard layout, while the LED display provides the same colour information via spatial feedback. This multi-modal system enhances user interaction with harmonic content, offering innovative possibilities for education and artistic performance. Limitations include slight latency and the inability to detect extended chords, which future development will aim to address through refined filtering, adaptive thresholds, and support for more complex harmonies such as sevenths and augmented chords. Future work will also explore integration with alternative visualization styles, and the comparison of audio analysis libraries to improve detection speed and precision. Plans also include formal user testing to evaluate perception, usability, and cross-cultural interpretations of colour-pitch mappings.
Authors:Stefan Pasch
Abstract:
Organizational efforts to utilize and operationalize artificial intelligence (AI) are often accompanied by substantial challenges, including scalability, maintenance, and coordination across teams. In response, the concept of Machine Learning Operations (MLOps) has emerged as a set of best practices that integrate software engineering principles with the unique demands of managing the ML lifecycle. Yet, empirical evidence on whether and how these practices support users in developing and operationalizing AI applications remains limited. To address this gap, this study analyzes over 8,000 user reviews of AI development platforms from G2.com. Using zero-shot classification, we measure review sentiment toward nine established MLOps practices, including continuous integration and delivery (CI/CD), workflow orchestration, reproducibility, versioning, collaboration, and monitoring. Seven of the nine practices show a significant positive relationship with user satisfaction, suggesting that effective MLOps implementation contributes tangible value to AI development. However, organizational context also matters: reviewers from small firms discuss certain MLOps practices less frequently, suggesting that organizational context influences the prevalence and salience of MLOps, though firm size does not moderate the MLOps-satisfaction link. This indicates that once applied, MLOps practices are perceived as universally beneficial across organizational settings.
Authors:Kalin Dimitrov
Abstract:
Pingmark defines a universal textual protocol for expressing spatial context through a minimal symbol: !@. Rather than embedding coordinates or using proprietary map links, Pingmark introduces a semantic trigger that compliant client applications interpret to generate a standardized resolver link of the form https://pingmark.me/lat/lon/[timestamp]. This allows location expression to function like existing textual conventions - @ for identity or # for topics - but for physical space. The protocol requires no user registration, relies on open mapping technologies, and protects privacy by generating location data ephemerally and locally. This paper presents the motivation, syntax, and design of the Pingmark Protocol Specification (PPS v0.1), its reference resolver implementation, and the long-term goal of establishing Pingmark as an open Internet standard for spatial mentions.
Authors:Tonmoy Ghosh
Abstract:
Password security has been compelled to evolve in response to the growing computational capabilities of modern systems. However, this evolution has often resulted in increasingly complex security practices that alienate users, leading to poor compliance and heightened vulnerability. Consequently, individuals remain exposed to attackers through weak or improperly managed passwords, underscoring the urgent need for a comprehensive defense mechanism that effectively addresses password-related risks and threats. In this paper, we propose a multifaceted solution designed to revolutionize password security by integrating diverse attributes such as the Password Dissection Mechanism, Dynamic Password Policy Mechanism, human behavioral patterns, device characteristics, network parameters, geographical context, and other relevant factors. By leveraging learning-based models, our framework constructs detailed user profiles capable of recognizing individuals and preventing nearly all forms of unauthorized access or device possession. The proposed framework enhances the usability-security paradigm by offering stronger protection than existing standards while simultaneously engaging users in the policy-setting process through a novel, adaptive approach.
Authors:Cathal Doyle
Abstract:
In the Generative Age, the nature of knowledge work is transforming. Traditional models that emphasise the organisation and retrieval of pre-existing information are increasingly inadequate in the face of generative AI (GenAI) systems capable of autonomous content creation. This paper introduces the Knowledge Sculptor (KS), a new professional archetype for Human-GenAI collaboration that transforms raw AI output into trustworthy, actionable knowledge. Grounded in a socio-technical perspective, the KS is conceptualised through a framework of competencies, including architecting a vision, iterative dialogue, information sculpting, and curiosity-driven synthesis. A practice-based vignette illustrates the KS role in action, and in a self-referential approach, the paper itself serves as an artefact of the sculpting process it describes.
Authors:Hitesh Mohapatra
Abstract:
This work presents AgroTrack, a LoRa-based IoT framework for remote livestock monitoring in smart agriculture. The system is designed for low-power, long-range communication and supports real-time tracking and basic health assessment of free-range livestock through GPS, motion, and temperature sensors integrated into wearable collars. Data is collected and transmitted via LoRa to gateways and forwarded to a cloud platform for visualization, alerts, and analytics. To enhance its practical deployment, AgroTrack incorporates advanced analytics, including machine learning models for predictive health alerts and behavioral anomaly detection. This integration transforms the framework from a basic monitoring tool into an intelligent decision-support system, enabling farmers to improve livestock management, operational efficiency, and sustainability in rural environments.
Authors:Russell Beale
Abstract:
As the world continues to change, more and more knowledge workers are embracing remote work. Yet this comes with its challenges for their productivity, and while many Task Management applications promise to improve the productivity of remote workers, it remains unclear how effective they are. Based on existing frameworks, this study investigated the productivity needs and challenges of remote knowledge workers and how they use Task Management tools. The research was conducted through a 2-week long, mixed-methods diary study and semi-structured interview. Perceptions of productivity, task management tool use and productivity challenges were observed. The findings show that using a digital Task Management application made no significant difference to using pen and paper for improving perceived productivity of remote workers and discuss the need for better personalization of Task Management applications.
Authors:Guillaume Rivière
Abstract:
The science of Human-Computer Interaction (HCI) is populated by isolated empirical findings, often tied to specific technologies, designs, and tasks. This situation probably lies in observing the wrong object of study, that is to say, observing interfaces rather than interaction. This paper proposes an experimental methodology, powered by a research methodology, that enables tackling the ambition of observing interaction (rather than interfaces). These observations are done during the treatment of applicative cases, allowing to generate and replicate results covering various experimental conditions, expressed from the need of end users and the evolution of technologies. Performing these observations when developing applicative prototypes illustrating novel technologies' utility allows, in the same time, to benefit from an optimization of these prototypes to better accomplish end users tasks. This paper depicts a long term research direction, from generating the initial observations of interaction properties and their replication, to their integration, that would then lead to exploring the possible relations existing between those properties, to end toward the description of human-computer interaction's physics.
Authors:Lonni Besançon
Abstract:
In this essay, I argue that, while visualization research does not seem to be directly at risk of being corrupted by the current massive wave of polluted research, certain visualization concepts are being used in fraudulent fashions and fields close to ours are being targeted. Worse, the society publishing our work is overwhelmed by thousands of questionable papers that are being, unfortunately, published. As a community, and if we want our research to remain as good as it currently is, I argue that we should all get involved with our variety of skills to help identify and correct the current scientific record. I thus aim to present a few questionable practices that are worth knowing about when reviewing for fields using visualization research, and hopefully will never be useful when reviewing for our main venues. I also argue that our skill set could become particularly relevant in the future and invite scholars of the fields to try to get involved.
Authors:Allen Daniel Sunny
Abstract:
This study explores the integration of contextual explanations into AI-powered loan decision systems to enhance trust and usability. While traditional AI systems rely heavily on algorithmic transparency and technical accuracy, they often fail to account for broader social and economic contexts. Through a qualitative study, I investigated user interactions with AI explanations and identified key gaps, in- cluding the inability of current systems to provide context. My findings underscore the limitations of purely technical transparency and the critical need for contex- tual explanations that bridge the gap between algorithmic outputs and real-world decision-making. By aligning explanations with user needs and broader societal factors, the system aims to foster trust, improve decision-making, and advance the design of human-centered AI systems
Authors:Jeff Brozena
Abstract:
Self-tracking is one of many behaviors involved in the long-term self-management of chronic illnesses. As consumer-grade wearable sensors have made the collection of health-related behaviors commonplace, the quality, volume, and availability of such data has dramatically improved. This exploratory longitudinal N-of-1 study quantitatively assesses four years of sleep data captured via the Oura Ring, a consumer-grade sleep tracking device, along with self-reported mood data logged using eMood Tracker for iOS. After assessing the data for stationarity and computing the appropriate lag-length selection, a vector autoregressive (VAR) model was fit along with Granger causality tests to assess causal mechanisms within this multivariate time series. Oura's nightly sleep quality score was shown to Granger-cause the presence of depressed and anxious moods using a VAR(2) model.
Authors:Christina Zhang
Abstract:
Many STEM concepts pose significant learning challenges to students due to their inherent complexity and abstract nature. Visualizing complex problems through animations can significantly enhance learning outcomes. However, the creation of animations can be time-consuming and inconvenient. Hence, many educators illustrate complex concepts by hand on a board or a digital device. Although static graphics are helpful for understanding, they are less effective than animations. The free and open-source Python package Manim enables educators to create visually compelling animations easily. Python's straightforward syntax, combined with Manim's comprehensive set of built-in classes and methods, greatly simplifies implementation. This article presents a series of examples that demonstrate how Manim can be used to create animated video lessons for a variety of topics in computer science and mathematics. In addition, it analyzes viewer feedback collected across multiple social media platforms to evaluate the effectiveness and accessibility of these visualizations. The article further explores broader potentials of the Manim Python library by showcasing demonstrations that extend its applications to subject areas beyond computer science and mathematics.
Authors:T. James Brandt
Abstract:
Adaptive chatbots that mimic a user's linguistic style can build rapport and engagement, yet unconstrained mimicry risks an agent that feels unstable or sycophantic. We present a computational evaluation framework that makes the core design tension explicit: balancing moment-to-moment linguistic synchrony against long-term persona stability. Using an 8-dimensional style vector and a closed-loop "base+delta" prompting architecture, we simulate and compare explicit adaptation policies - Uncapped, Cap, Exponential Moving Average (EMA), Dead-Band, and Hybrids - on a human-log dataset. Our analysis maps a clear Pareto frontier: bounded policies achieve substantial gains in stability at a modest cost to synchrony. For example, a Hybrid (EMA+Cap) raises stability from 0.542 to 0.878 (+62%) while reducing synchrony by only 17%. We confirm this trade-off through large-scale replications on three public corpora (DailyDialog, Persona-Chat, EmpatheticDialogues) and LLM-in-the-loop validation across two model families. Furthermore, we quantify "prompt legibility," showing that frontier policies reduce instruction churn and cut jarring register flips (major tone changes) from 0.254 to 0.092, yielding systems that are easier to reason about and maintain. Taken together, our framework provides a general evaluation harness for style adaptation; a systematic ablation that identifies Pareto-efficient policies; robust validation across diverse datasets and models; and novel legibility metrics linking policy choices to system maintainability.
Authors:Paul C. Parsons
Abstract:
Visualization research often centers on how visual representations generate insight, guide interpretation, or support decision-making. But in many real-world domains, visualizations do not stand out--they recede into the background, stabilized and trusted as part of the everyday infrastructure of work. This paper explores what it means to take such quiet roles seriously. Drawing on theoretical traditions from joint cognitive systems, naturalistic decision making, and infrastructure studies, I examine how visualization can become embedded in the rhythms of expert practice--less a site of intervention than a scaffold for attention, coordination, and judgment. I illustrate this reorientation with examples from mission control operations at NASA, where visualizations are deeply integrated but rarely interrogated. Rather than treat invisibility as a failure of design or innovation, I argue that visualization's infrastructural presence demands new concepts, methods, and critical sensibilities. The goal is not to diminish visualization's importance, but to broaden the field's theoretical repertoire--to recognize and support visualization-in-use even when it fades from view.
Authors:Kichang Lee
Abstract:
Large language models (LLMs) are emerging as everyday assistants, but their role as longitudinal virtual coaches is underexplored. This two-month single subject case study documents LLM guided half marathon preparation (July-September 2025). Using text based interactions and consumer app logs, the LLM acted as planner, explainer, and occasional motivator. Performance improved from sustaining 2 km at 7min 54sec per km to completing 21.1 km at 6min 30sec per km, with gains in cadence, pace HR coupling, and efficiency index trends. While causal attribution is limited without a control, outcomes demonstrate safe, measurable progress. At the same time, gaps were evident, no realtime sensor integration, text only feedback, motivation support that was user initiated, and limited personalization or safety guardrails. We propose design requirements for next generation systems, persistent athlete models with explicit guardrails, multimodal on device sensing, audio, haptic, visual feedback, proactive motivation scaffolds, and privacy-preserving personalization. This study offers grounded evidence and a design agenda for evolving LLMs from retrospective advisors to closed-loop coaching companions.
Authors:Prerna Luthra
Abstract:
We introduce a psychologically grounded and artist-informed framework for modeling visual creativity across four domains: Inner, Outer, Imaginative, and Moral Worlds. Drawing on interviews with practicing artists and theories from psychology, we define 12 traits that capture affective, symbolic, cultural, and ethical dimensions of creativity.Using 20k artworks from the SemArt dataset, we annotate images with GPT 4.1 using detailed, theory-aligned prompts, and evaluate the learnability of these traits from CLIP image embeddings. Traits such as Environmental Dialogicity and Redemptive Arc are predicted with high reliability ($R^2 \approx 0.64 - 0.68$), while others like Memory Imprint remain challenging, highlighting the limits of purely visual encoding. Beyond technical metrics, we visualize a "creativity trait-space" and illustrate how it can support interpretable, trait-aware co-creation - e.g., sliding along a Redemptive Arc axis to explore works of adversity and renewal. By linking cultural-aesthetic insights with computational modeling, our work aims not to reduce creativity to numbers, but to offer shared language and interpretable tools for artists, researchers, and AI systems to collaborate meaningfully.
Authors:Michel Youssef
Abstract:
We define a practical method to quantify the trade-off between security and operational friction in modern identity-centric programs. We introduce the Security Friction Quotient (SFQ), a bounded composite index that combines a residual-risk estimator with empirically grounded friction terms (latency, failure rate, and helpdesk impact). We establish clarity properties (boundedness, monotonic response, and weight identifiability) with short proofs, then evaluate widely used Conditional Access policies over a 12-week horizon using Monte Carlo simulation (n = 2,000 runs per policy/scenario) with effect sizes and 95% confidence intervals. We further assess rank stability under 10,000 random weight draws, finding 95.5% preservation of policy ordering. Finally, we provide a 12-week passkey field observation from an enterprise-scale cohort (N = 1,200) that directionally aligns with the simulation's phishing-resistant MFA gains. The SFQ framework is designed to be reproducible, interpretable, and directly actionable for Zero Trust identity policy decisions, with artifacts and parameter ranges provided to support policy design, review, and continuous improvement.
Authors:Kolawole Tytler
Abstract:
Clinicians face growing information overload from biomedical literature and guidelines, hindering evidence-based care. Retrieval-augmented generation (RAG) with large language models may provide fast, provenance-linked answers, but requires real-world evaluation. We describe iatroX, a UK-centred RAG-based clinical reference platform, and report early adoption, usability, and perceived clinical value from a formative implementation evaluation. Methods comprised a retrospective analysis of usage across web, iOS, and Android over 16 weeks (8 April-31 July 2025) and an in-product intercept survey. Usage metrics were drawn from web and app analytics with bot filtering. A client-side script randomized single-item prompts to approx. 10% of web sessions from a predefined battery assessing usefulness, reliability, and adoption intent. Proportions were summarized with Wilson 95% confidence intervals; free-text comments underwent thematic content analysis. iatroX reached 19,269 unique web users, 202,660 engagement events, and approx. 40,000 clinical queries. Mobile uptake included 1,960 iOS downloads and Android growth (peak >750 daily active users). The survey yielded 1,223 item-level responses: perceived usefulness 86.2% (95% CI 74.8-93.9%; 50/58); would use again 93.3% (95% CI 68.1-99.8%; 14/15); recommend to a colleague 88.4% (95% CI 75.1-95.9%; 38/43); perceived accuracy 75.0% (95% CI 58.8-87.3%; 30/40); reliability 79.4% (95% CI 62.1-91.3%; 27/34). Themes highlighted speed, guideline-linked answers, and UK specificity. Early real-world use suggests iatroX can mitigate information overload and support timely answers for UK clinicians. Limitations include small per-item samples and early-adopter bias; future work will include accuracy audits and prospective studies on workflow and care quality.
Authors:Jiexi Xu
Abstract:
The inherent non-deterministic nature of autonomous agents, particularly within low-code/no-code (LCNC) environments, presents significant reliability challenges. Agents can become trapped in unforeseen loops, generate inaccurate outputs, or encounter unrecoverable failures, leading to user frustration and a breakdown of trust. This report proposes a novel architectural pattern to address these issues: the integration of a secondary, "metacognitive" layer that actively monitors the primary LCNC agent. Inspired by human introspection, this layer is designed to predict impending task failures based on a defined set of triggers, such as excessive latency or repetitive actions. Upon predicting a failure, the metacognitive agent proactively initiates a human handoff, providing the user with a clear summary of the agent's "thought process" and a detailed explanation of why it could not proceed. An empirical analysis of a prototype system demonstrates that this approach significantly increases the overall task success rate. However, this performance gain comes with a notable increase in computational overhead. The findings reframe human handoffs not as an admission of defeat but as a core design feature that enhances system resilience, improves user experience, and builds trust by providing transparency into the agent's internal state. The report discusses the practical and ethical implications of this approach and identifies key directions for future research.
Authors:Yoichi Ochiai
Abstract:
This paper presents a case study of the thematic pavilion null2 at Expo 2025 Osaka-Kansai, contrasting with the static Jomon motifs of Taro Okamoto's Tower of the Sun from Expo 1970. The study discusses Yayoi-inspired mirror motifs and dynamically transforming interactive spatial configuration of null2, where visitors become integrated as experiential content. The shift from static representation to a new ontological and aesthetic model, characterized by the visitor's body merging in real-time with architectural space at installation scale, is analyzed. Referencing the philosophical context of Expo 1970 theme 'Progress and Harmony for Mankind,' this research reconsiders the worldview articulated by null2 in Expo 2025, in which computation is naturalized and ubiquitous, through its intersection with Eastern philosophical traditions. It investigates how immersive experiences within the pavilion, grounded in the philosophical framework of Digital Nature, reinterpret traditional spatial and structural motifs of the tea room, positioning them within contemporary digital art discourse. The aim is to contextualize and document null2 as an important contemporary case study from Expo practices, considering the historical and social background in Japan from the 19th to 21st century, during which world expositions served as pivotal points for the birth of modern Japanese concept of 'fine art,' symbolic milestones of economic development, and key moments in urban and media culture formation. Furthermore, this paper academically organizes architectural techniques, computer graphics methodologies, media art practices, and theoretical backgrounds utilized in null2, highlighting the scholarly significance of preserving these as an archival document for future generations.
Authors:Shan Jiang
Abstract:
Generative artificial intelligence (AI) has begun to reshape academic publishing by enabling the rapid production of submission-ready manuscripts. While such tools promise to enhance productivity, they also raise concerns about overwhelming journal systems that have fixed acceptance capacities. This paper uses simulation modeling to investigate how AI-driven surges in submissions may affect desk rejection rates, review cycles, and faculty publication portfolios, with a focus on business school journals and tenure processes. Three scenarios are analyzed: a baseline model, an Early Adopter model where a subset of faculty boosts productivity, and an AI Abuse model where submissions rise exponentially. Results indicate that early adopters initially benefit, but overall acceptance rates fall sharply as load increases, with tenure-track faculty facing disproportionately negative outcomes. The study contributes by demonstrating the structural vulnerabilities of the current publication system and highlights the need for institutional reform in personnel evaluation and research dissemination practices.
Authors:Peterson Jean
Abstract:
The increase in safety and critical systems improved Healthcare. Due to their risk of harm, such systems are subject to stringent guidelines and compliances. These safety measures ensure a seamless experience and mitigate the risk to end-users. Institutions like the Food and Drug Administration and the NHS, respectively, established international standards and competency frameworks to ensure industry compliance with these safety concerns. Medical device manufacturing is mainly concerned with standards. Consequently, these standards now advocate for better human factors considered in user interaction for medical devices. This forces manufacturers to rely on heavy testing and review to cover many of these factors during development. Sadly, many human factor risks will not be caught until proper testing in real life, which might be catastrophic in the case of an ambulatory device like the T34 syringe pump. Therefore, effort in formal methods research may propose new solutions in anticipating these errors in the early stages of development or even reducing their occurrence based on the use of standard generic model. These generically developed models will provide a common framework for safety integration in industry and may potentially be proven using formal verification mathematical proofs. This research uses SPARK Ada's formal verification tool against a behavioural model of the T34 syringe driver. A Generic Infusion Pump model refinement is explored and implemented in SPARK Ada. As a subset of the Ada language, the verification level of the end prototype is evaluated using SPARK. Exploring potential limitations defines the proposed model's implementation liability when considering abstraction and components of User Interface design in SPARK Ada.
Authors:Yue Jin
Abstract:
Emotions play a crucial role in human life. The research community has proposed many theories on emotions without reaching much consensus. The situation is similar for emotions in cognitive architectures and autonomous agents. I propose in this paper that emotions are recognized patterns of cognitive activities. These activities are responses of an agent to the deviations between the targets of its goals and the performances of its actions. Emotions still arise even if these activities are purely logical. I map the patterns of cognitive activities to emotions. I show the link between emotions and attention and the impacts of the parameterized functions in the cognitive architecture on the computing of emotions. My proposition bridges different theories on emotions and advances the building of consensus.
Authors:Mark G Orr
Abstract:
The focus of the current work concerned the psychological processes that underlie prediction of an events duration. The objective was to push forward existing psychological theory on event duration prediction, something made possible by the unique features of our data context. The provisional findings suggested that the prior, existing theoretical mechanism of event duration prediction is incomplete because: i. it does not support adaptive responses when event duration judgments are dependent, ii. it does not afford the integration of new, on the fly, information. Our findings suggest specific directions for future research.
Authors:Rashid Mushkani
Abstract:
Automation now steers building HVAC, distribution grids, and traffic signals, yet residents rarely have authority to pause or redirect these systems when they harm inclusivity, safety, or accessibility. We formalize a Right-to-Override (R2O) - defining override authorities, evidentiary thresholds, and domain-validated safe fallback states - and introduce a Deliberative Audit Method (DAM) with playbooks for pre-deployment walkthroughs, shadow-mode trials, and post-incident review. We instantiate R2O/DAM in simulations of smart-grid load shedding, building HVAC under occupancy uncertainty, and multi-agent traffic signals. R2O reduces distributional harm with limited efficiency loss: load-shedding disparity in unserved energy drops from 5.61x to 0.69x with constant curtailment; an override eliminates two discomfort-hours for seniors at an energy cost of 77 kWh; and median pedestrian wait falls from 90.4 s to 55.9 s with a 6.0 s increase in mean vehicle delay. We also contribute a policy standard, audit worksheets, and a ModelOps integration pattern to make urban automation contestable and reviewable.
Authors:Mouhacine Benosman
Abstract:
Artificial intelligence (AI), particularly in the form of large language models (LLMs) or chatbots, has become increasingly integrated into our daily lives. In the past five years, several LLMs have been introduced, including ChatGPT by OpenAI, Claude by Anthropic, and Llama by Meta, among others. These models have the potential to be employed across a wide range of human-machine interaction applications, such as chatbots for information retrieval, assistance in corporate hiring decisions, college admissions, financial loan approvals, parole determinations, and even in medical fields like psychotherapy delivered through chatbots. The key question is whether these chatbots will interact with humans in a bias-free manner or if they will further reinforce the existing pathological biases present in human-to-human interactions. If the latter is true, then how can we rigorously measure these biases? We address this challenge by introducing STAMP-LLM (Standardized Test and Assessment Measurement Protocol for LLMs), a psychometric-based principled two-phase framework for designing psychometric measures to evaluate chatbot biases: (i) a Definitional phase for construct mapping, item development, and expert review; and (ii) a Data/Analysis phase for protocol control (prompts/decoding), automated sampling, pre-specified scoring, and basic reliability/validity checks. We illustrate STAMP-LLM on racial bias using one explicit and two implicit measures.
Authors:Tim Gorichanaz
Abstract:
This chapter examines identity theft in the digital age, particularly in the context of emerging artificial intelligence (AI) technologies. It begins with a discussion of big data and selfhood, the concepts of data selves and data doubles, and the process of identification in the digital age. Next, the literature on online identity theft is reviewed, including its theoretical and empirical aspects. As is evident from that review, AI technologies have increased the speed and scale of identity crimes that were already rampant in the online world, even while they have led to new ways of detecting and preventing such crimes. As with any new technology, AI is currently fuelling an arms race between criminals and law enforcement, with end users often caught powerless in the middle. The chapter closes by exploring some emerging directions and future possibilities of identity theft in the age of AI.
Authors:Rashid Mushkani
Abstract:
Text-to-image diffusion models help visualize urban futures but can amplify group-level harms. We propose collective recourse: structured community "visual bug reports" that trigger fixes to models and planning workflows. We (1) formalize collective recourse and a practical pipeline (report, triage, fix, verify, closure); (2) situate four recourse primitives within the diffusion stack: counter-prompts, negative prompts, dataset edits, and reward-model tweaks; (3) define mandate thresholds via a mandate score combining severity, volume saturation, representativeness, and evidence; and (4) evaluate a synthetic program of 240 reports. Prompt-level fixes were fastest (median 2.1-3.4 days) but less durable (21-38% recurrence); dataset edits and reward tweaks were slower (13.5 and 21.9 days) yet more durable (12-18% recurrence) with higher planner uptake (30-36%). A threshold of 0.12 yielded 93% precision and 75% recall; increasing representativeness raised recall to 81% with little precision loss. We discuss integration with participatory governance, risks (e.g., overfitting to vocal groups), and safeguards (dashboards, rotating juries).
Authors:Tawfiq Ammari
Abstract:
This study investigates how the U.S. Centers for Disease Control and Prevention (CDC) communicated COVID-19 guidance on Twitter and how publics responded over two years of the pandemic. Drawing on 275,124 tweets mentioning or addressing @CDCgov, I combine BERTopic modeling, sentiment analysis (VADER), credibility checks (Iffy Index), change point detection (PELT), and survival analysis to trace three phases of discourse: (1) early hoax claims and testing debates, (2) lockdown and mask controversies, and (3) post-vaccine variant concerns. I introduce the concept of crisis messaging journeys to explain how archived "receipts" of prior CDC statements fueled epistemic struggles, political polarization, and sustained engagement. Findings show that skeptical, cognitively complex discourse particularly questioning institutional trust prolonged participation, while positive affirmation predicted faster disengagement. I conclude with design recommendations for annotated, cautious, and flashpoint-responsive communication strategies to bolster public trust and resilience during protracted health crises.
Authors:Lindsay Blackwell
Abstract:
This study examines the failures and possibilities of contemporary social media governance through the lived experiences of various content moderation professionals. Drawing on participatory design workshops with 33 practitioners in both the technology industry and broader civil society, this research identifies significant structural misalignments between corporate incentives and public interests. While experts agree that successful content moderation is principled, consistent, contextual, proactive, transparent, and accountable, current technology companies fail to achieve these goals, due in part to exploitative labor practices, chronic underinvestment in user safety, and pressures of global scale. I argue that successful governance is undermined by the pursuit of technological novelty and rapid growth, resulting in platforms that necessarily prioritize innovation and expansion over public trust and safety. To counter this dynamic, I revisit the computational history of care work, to motivate present-day solidarity amongst platform governance workers and inspire systemic change.
Authors:Melinda Cohoon
Abstract:
Internet censorship in the Islamic Republic of Iran restricts access to global platforms and services, forcing users to rely on circumvention technologies such as VPNs, proxies, and tunneling tools. This report presents findings from a mixed-methods study of 660 Iranian internet users, with a focus on gamers as a digitally literate and socially networked community. Survey data are combined with network measurements of latency and VPN performance to identify both technical and social strategies of circumvention. Results show that while younger users report higher confidence with circumvention, peer networks, rather than formal training, are the strongest predictors of resilience. Gaming communities, particularly those active on platforms such as Discord and Telegram, serve as hubs for sharing tactics and lowering barriers to adoption. These findings extend existing work on usable security and censorship circumvention by highlighting the intersection of infrastructural conditions and social learning. The study concludes with design and policy implications for developers, researchers, and funders working on digital rights and information controls.
Authors:Angjelin Hila
Abstract:
Emerging paradigms in XR, AI, and BCI contexts necessitate novel theoretical frameworks for understanding human autonomy and agency in HCI. Drawing from enactivist theories of cognition, we conceptualize human agents as self-organizing, operationally closed systems that actively enact their cognitive domains through dynamic interaction with their environments. To develop measurable variables aligned with this framework, we introduce "feelings of agency" (FoA) as an alternative to the established construct of "sense of agency" (SoA), refining Synofzyk's multifactorial weighting model and offering a novel conceptual pathway for overcoming gaps in the dominant comparator model. We define FoA as comprising two subconstructs: affective engagement and volitional attention, which we operationalize through integrated neurodynamic indicators (valence, arousal, cross frequency coupling within the dorsal attention system) and first-person phenomenological reports. We argue that these neurophenomenological indicators provide richer, more actionable insights for digital affordance design, particularly in XR, BCI, Human AI Interaction (HAX), and generative AI environments. Our framework aims to inform and inspire design parameters that significantly enhance human agency in rapidly evolving interactive domains.
Authors:Khushiyant
Abstract:
Text generating capabilities have undergone a substantial transformation with the introduction of large language models (LLMs). Electroencephalography (EEG)-based text production is still difficult, though, because it requires a lot of data and processing power. This paper introduces a new method that combines the use of the Gemma 2B LLM with a classifier-LLM architecture to incorporate a Recurrent Neural Network (RNN) encoder. Our approach drastically lowers the amount of data and compute power needed while achieving performance close to that of cutting-edge methods. Notably, compared to current methodologies, our methodology delivers an overall performance improvement of 10%. The suggested architecture demonstrates the possibility of effective transfer learning for EEG-based text production, remaining strong and functional even in the face of data limits. This work highlights the potential of integrating LLMs with EEG decoding to improve assistive technologies and improve independence and communication for those with severe motor limitations. Our method pushes the limits of present capabilities and opens new paths for research and application in brain-computer interfaces by efficiently using the strengths of pre-trained language models. This makes EEG-based text production more accessible and efficient.
Authors:Vishal Choudhari
Abstract:
We present Beamforming-LLM, a system that enables users to semantically recall conversations they may have missed in multi-speaker environments. The system combines spatial audio capture using a microphone array with retrieval-augmented generation (RAG) to support natural language queries such as, "What did I miss when I was following the conversation on dogs?" Directional audio streams are separated using beamforming, transcribed with Whisper, and embedded into a vector database using sentence encoders. Upon receiving a user query, semantically relevant segments are retrieved, temporally aligned with non-attended segments, and summarized using a lightweight large language model (GPT-4o-mini). The result is a user-friendly interface that provides contrastive summaries, spatial context, and timestamped audio playback. This work lays the foundation for intelligent auditory memory systems and has broad applications in assistive technology, meeting summarization, and context-aware personal spatial computing.
Authors:Alexander Erlei
Abstract:
Generative AI is transforming the provision of expert services. This article uses a series of one-shot experiments to quantify the behavioral, welfare and distribution consequences of large language models (LLMs) on AI-AI, Human-Human, Human-AI and Human-AI-Human expert markets. Using a credence goods framework where experts have private information about the optimal service for consumers, we find that Human-Human markets generally achieve higher levels of efficiency than AI-AI and Human-AI markets through pro-social expert preferences and higher consumer trust. Notably, LLM experts still earn substantially higher surplus than human experts -- at the expense of consumer surplus - suggesting adverse incentives that may spur the harmful deployment of LLMs. Concurrently, a majority of human experts chooses to rely on LLM agents when given the opportunity in Human-AI-Human markets, especially if they have agency over the LLM's (social) objective function. Here, a large share of experts prioritizes efficiency-loving preferences over pure self-interest. Disclosing these preferences to consumers induces strong efficiency gains by marginalizing self-interested LLM experts and human experts. Consequently, Human-AI-Human markets outperform Human-Human markets under transparency rules. With obfuscation, however, efficiency gains disappear, and adverse expert incentives remain. Our results shed light on the potential opportunities and risks of disseminating LLMs in the context of expert services and raise several regulatory challenges. On the one hand, LLMs can negatively affect human trust in the presence of information asymmetries and partially crowd-out experts' other-regarding preferences through automation. On the other hand, LLMs allow experts to codify and communicate their objective function, which reduces information asymmetries and increases efficiency.
Authors:Isac Holm
Abstract:
The advancement of Object Detection (OD) using Deep Learning (DL) is often hindered by the significant challenge of acquiring large, accurately labeled datasets, a process that is time-consuming and expensive. While techniques like Active Learning (AL) can reduce annotation effort by intelligently querying informative samples, they often lack transparency, limit the strategic insight of human experts, and may overlook informative samples not aligned with an employed query strategy. To mitigate these issues, Human-in-the-Loop (HITL) approaches integrating human intelligence and intuition throughout the machine learning life-cycle have gained traction. Leveraging Visual Analytics (VA), effective interfaces can be created to facilitate this human-AI collaboration. This thesis explores the intersection of these fields by developing and investigating "VILOD: A Visual Interactive Labeling tool for Object Detection". VILOD utilizes components such as a t-SNE projection of image features, together with uncertainty heatmaps and model state views. Enabling users to explore data, interpret model states, AL suggestions, and implement diverse sample selection strategies within an iterative HITL workflow for OD. An empirical investigation using comparative use cases demonstrated how VILOD, through its interactive visualizations, facilitates the implementation of distinct labeling strategies by making the model's state and dataset characteristics more interpretable (RQ1). The study showed that different visually-guided labeling strategies employed within VILOD result in competitive OD performance trajectories compared to an automated uncertainty sampling AL baseline (RQ2). This work contributes a novel tool and empirical insight into making the HITL-AL workflow for OD annotation more transparent, manageable, and potentially more effective.
Authors:Sasha Mitts
Abstract:
In the rapidly evolving field of artificial intelligence (AI), traditional benchmarks can fall short in attempting to capture the nuanced capabilities of AI models. We focus on the case of physical world modeling and propose a novel approach to augment existing benchmarks with human-derived evaluation criteria, aiming to enhance the interpretability and applicability of model behaviors. Grounding our study in the Perception Test and OpenEQA benchmarks, we conducted in-depth interviews and large-scale surveys to identify key cognitive skills, such as Prioritization, Memorizing, Discerning, and Contextualizing, that are critical for both AI and human reasoning. Our findings reveal that participants perceive AI as lacking in interpretive and empathetic skills yet hold high expectations for AI performance. By integrating insights from our findings into benchmark design, we offer a framework for developing more human-aligned means of defining and measuring progress. This work underscores the importance of user-centered evaluation in AI development, providing actionable guidelines for researchers and practitioners aiming to align AI capabilities with human cognitive processes. Our approach both enhances current benchmarking practices and sets the stage for future advancements in AI model evaluation.
Authors:Luciano A. Abriata
Abstract:
Molecular visualization software has long supported research and education in chemical and structural sciences, but consumer devices constrained to 2D inputs and outputs pose two major challenges: they poorly convey 3D nature, and 3D manipulation is very difficult. eXtended Reality (XR, including AR and VR) offers new ways to see and interact with molecules in three dimensions. This chapter presents the "MolecularWeb" ecosystem (https://molecularweb.org), a set of web-based tools for immersive visualization, modeling, and simulations, already widely used in education and science communication and now expanding toward research applications. We cover moleculARweb, which provides AR educational activities via phones, tablets, and computers; MolecularWebXR, a multiuser WebXR platform accessible from both headsets and simpler devices, supporting immersive education, outreach, and scientific discussion; and PDB2AR, which enables users to generate custom content for MolecularWebXR and standalone AR/VR. Finally, we introduce a prototype and an upcoming version of HandMol, our latest WebXR software which allows concurrent multiuser immersive visualization and modeling of molecules with bare hands supported by real-time molecular mechanics, natural language input via a language model, and access through both high-end headsets or consumer devices like smartphones and laptops. Together, these tools demonstrate the present and near-future of accessible, interactive molecular science on the web.
Authors:Minja Axelsson
Abstract:
This paper examines the speculative topic of equitable robots through an exploratory essay format. It focuses specifically on robots by and for LGBTQ+ populations. It aims to provoke thought and conversations in the field about what aspirational queer robotics futures may look like, both in the arts and sciences. First, it briefly reviews the state-of-the-art of queer robotics in fiction and science, drawing together threads from each. Then, it discusses queering robots through three speculative design proposals for queer robot roles: 1) reflecting the queerness of their ''in-group'' queer users, building and celebrating ''in-group'' identity, 2) a new kind of queer activism by implementing queer robot identity performance to interact with ''out-group'' users, with a goal of reducing bigotry through familiarisation, and 3) a network of queer-owned robots, through which the community could reach each other, and distribute and access important resources. The paper then questions whether robots should be queered, and what ethical implications this raises. Finally, the paper makes suggestions for what aspirational queer robotics futures may look like, and what would be required to get there.
Authors:Yvonne Rogers
Abstract:
Students routinely use ChatGPT and the like now to help them with their homework, such as writing an essay. It takes less effort to complete and is easier to do than by hand. It can even produce as good if not better output than the student's own work. However, there is a growing concern that over-reliance on using GenAI in this way will stifle the development of learning writing and critical thinking skills. How might this trend be reversed? What if students were required to make more effort when using GenAI to do their homework? It might be more challenging, but the additional effort involved could result in them learning more and having a greater sense of achievement. This tension can be viewed as a form of effort paradox; where effort is both viewed as something to be avoided but at the same time is valued. Is it possible to let students learn sometimes with less and other times more effort? Students are already adept at the former but what about the latter? Could we design new kinds of AI tools that deliberately require more effort to use to deepen the learning experience? In this paper, I begin to outline what form these might take, for example, asking students to use a combination of GenAI tools with traditional learning approaches (e.g. note-taking while reading). I also discuss how else to design tools to think with that augments human cognition; where students learn more the skills of metacognition and reflection.
Authors:Vanessa Figueiredo
Abstract:
This paper presents two studies on how Brazilian children (ages 9--11) use conversational agents (CAs) for schoolwork, discovery, and entertainment, and how structured scaffolds can enhance these interactions. In Study 1, a seven-week online investigation with 23 participants (children, parents, teachers) employed interviews, observations, and Cognitive Work Analysis to map children's information-processing flows, the role of more knowledgeable others, functional uses, contextual goals, and interaction patterns to inform conversation-tree design. We identified three CA functions: School, Discovery, Entertainment, and derived ``recipe'' scaffolds mirroring parent-child support. In Study 2, we prompted GPT-4o-mini on 1,200 simulated child-CA exchanges, comparing conversation-tree recipes based on structured-prompting to an unstructured baseline. Quantitative evaluation of readability, question count/depth/diversity, and coherence revealed gains for the recipe approach. Building on these findings, we offer design recommendations: scaffolded conversation-trees, child-dedicated profiles for personalized context, and caregiver-curated content. Our contributions include the first CWA application with Brazilian children, an empirical framework of child-CA information flows, and an LLM-scaffolding ``recipe'' (i.e., structured-prompting) for effective, scaffolded learning.
Authors:Jonas Henkel
Abstract:
The rapid development of artificial intelligence (AI), marked by breakthroughs like 'AlphaEvolve' and 'Gemini Deep Think', is beginning to offer powerful new tools that have the potential to significantly alter the research practice in many areas of mathematics. This paper explores the current landscape of publicly accessible large language models (LLMs) in a mathematical research context, based on developments up to August 2, 2025. Our analysis of recent benchmarks, such as MathArena and the Open Proof Corpus (BalunoviÄ et al., 2025; Dekoninck et al., 2025), reveals a complex duality: while state-of-the-art models demonstrate strong abilities in solving problems and evaluating proofs, they also exhibit systematic flaws, including a lack of self-critique and a model depending discrepancy between final-answer accuracy and full-proof validity.
Based on these findings, we propose a durable framework for integrating AI into the research workflow, centered on the principle of the augmented mathematician. In this model, the AI functions as a copilot under the critical guidance of the human researcher, an approach distilled into five guiding principles for effective and responsible use. We then systematically explore seven fundamental ways AI can be applied across the research lifecycle, from creativity and ideation to the final writing process, demonstrating how these principles translate into concrete practice.
We conclude that the primary role of AI is currently augmentation rather than automation. This requires a new skill set focused on strategic prompting, critical verification, and methodological rigor in order to effectively use these powerful tools.
Authors:Evandro L. T. P. Cunha
Abstract:
The 2020s have been witnessing a very significant advance in the development of generative artificial intelligence tools, including text generation systems based on large language models. These tools have been increasingly used to generate texts in the most diverse domains -- from technical texts to literary texts --, which might eventually lead to a lower volume of written text production by humans. This article discusses the possibility of a future in which human beings will have lost or significantly decreased their ability to write due to the outsourcing of this activity to machines. This possibility parallels the loss of the ability to write in other moments of human history, such as during the so-called Greek Dark Ages (approx. 1200 BCE - 800 BCE).
Authors:Bijean Ghafouri
Abstract:
A growing body of empirical work suggests that the widespread adoption of generative AI produces a significant homogenizing effect on information, creativity, and cultural production. I first develop a novel theoretical framework to explain this phenomenon. I argue that a dynamic of AI-derivative epistemology, in which individuals increasingly defer to AI outputs, allows a centralized AI Prism to function, a technical mechanism whose architecture is designed to reduce variance and converge on the statistical mean. This provides a causal explanation for the generative monocultures observed in recent studies. However, I contend this represents only the first stage of a more complex and dialectical process. This paper's central and paradoxical thesis is that the very homogenization that flattens knowledge within specialized domains simultaneously renders that knowledge into consistent modules that can be recombined across them, a process foundational to innovation and creativity. However, this recombinant potential is not automatic, but rather conditional. This paper argues that these opposing forces, homogenizing defaults versus recombinant possibilities, are governed by the nature of human engagement with the technology. The ultimate effect of generative AI is conditional on whether individuals act as passive consumers deferring to the AI's statistical outputs, or as active curators who critically interrogate, re-contextualize, and recombine them. The paper concludes by outlining the cognitive and institutional scaffolds required to resolve this tension, arguing they are the decisive variable that determine whether generative AI becomes an instrument of innovation or homogenization.
Authors:Miho Imai
Abstract:
As floor-sensing technologies gain traction in movement research, questions remain about their usability and effectiveness for non-expert users. This study presents a case study evaluating Flexel, a modular, low-cost, high-resolution pressure-sensing floor interface, in the context of Nihon Buyo, a traditional Japanese dance. The system was installed, calibrated, and used by a first-time, non-technical user to track weight distribution patterns of a teacher and learner over nine weeks. Live pressure data was synchronized with video recordings, and custom software was developed to process and analyze the signal. Despite expectations that the learner's weight distribution would converge toward the teacher's over time, quantitative analyses revealed that the learner developed a consistent yet distinct movement profile. These findings suggest that even within rigid pedagogical structures, individual movement signatures can emerge. More importantly, the study demonstrates that Flexel can be deployed and operated effectively by non-expert users, highlighting its potential for broader adoption in education, performance, and embodied research.
Authors:Georgios P. Georgiou
Abstract:
The accelerated evolution of large language models has raised questions about their comparative performance across domains of practical importance. GPT-4 by OpenAI introduced advances in reasoning, multimodality, and task generalization, establishing itself as a valuable tool in education, clinical diagnosis, and academic writing, though it was accompanied by several flaws. Released in August 2025, GPT-5 incorporates a system-of-models architecture designed for task-specific optimization and, based on both anecdotal accounts and emerging evidence from the literature, demonstrates stronger performance than its predecessor in medical contexts. This study provides one of the first systematic comparisons of GPT-4 and GPT-5 using human raters from linguistics and clinical fields. Twenty experts evaluated model-generated outputs across five domains: lesson planning, assignment evaluation, clinical diagnosis, research generation, and ethical reasoning, based on predefined criteria. Mixed-effects models revealed that GPT-5 significantly outperformed GPT-4 in lesson planning, clinical diagnosis, research generation, and ethical reasoning, while both models performed comparably in assignment assessment. The findings highlight the potential of GPT-5 to serve as a context-sensitive and domain-specialized tool, offering tangible benefits for education, clinical practice, and academic research, while also advancing ethical reasoning. These results contribute to one of the earliest empirical evaluations of the evolving capabilities and practical promise of GPT-5.
Authors:Mohammed O. Alannsary
Abstract:
As digital government platforms become central to public service delivery, understanding citizen assessment is crucial for enhancing usability, trust, and inclusivity. This study investigates citizen satisfaction with the e-government services in Saudi Arabia through a quality-in-use framework based on ISO/IEC 25010 and ISO/IEC 25022 standards, interpreted through the lens of the Unified Theory of Acceptance and Use of Technology (UTAUT). A structured questionnaire was administered to 500 citizens, yielding 276 valid responses. Satisfaction was evaluated across four dimensions: overall satisfaction, feature satisfaction, trust, and emotional engagement (pleasure). The findings demonstrate consistently high levels of satisfaction regarding usability and trust, aligning with Saudi Arabia's top-tier global ranking in e-government development. However, the results also highlight persistent challenges related to service clarity and system responsiveness. Emotional engagement was limited, indicating that users perceive these services primarily as functional tools rather than as engaging digital experiences. The study offers valuable insights for policymakers and contributes to the theoretical integration of standards-based and behavioral adoption models in the context of citizenship.
Authors:Amod K. Agrawal
Abstract:
Indoor localization is a long-standing challenge in mobile computing, with significant implications for enabling location-aware and intelligent applications within smart environments such as homes, offices, and retail spaces. As AI assistants such as Amazon Alexa and Google Nest become increasingly pervasive, microphone-equipped devices are emerging as key components of everyday life and home automation. This paper introduces a passive, infrastructure-light system for localizing human speakers using speech signals captured by two or more spatially distributed smart devices. The proposed approach, GCC+, extends the Generalized Cross-Correlation with Phase Transform (GCC-PHAT) method to estimate the Angle-of-Arrival (AoA) of audio signals at each device and applies robust triangulation techniques to infer the speaker's two-dimensional position. To further improve temporal resolution and localization accuracy, feature-space expansion and subsample interpolation techniques are employed for precise Time Difference of Arrival (TDoA) estimation. The system operates without requiring hardware modifications, prior calibration, explicit user cooperation, or knowledge of the speaker's signal content, thereby offering a highly practical solution for real-world deployment. Experimental evaluation in a real-world home environment yields a median AoA estimation error of 2.2 degrees and a median localization error of 1.25 m, demonstrating the feasibility and effectiveness of audio-based localization for enabling context-aware, privacy-preserving ambient intelligence.
Authors:Rénald Gesnot
Abstract:
This research paper examines, from a multidimensional perspective (cognitive, social, ethical, and philosophical), how AI is transforming human thought. It highlights a cognitive offloading effect: the externalization of mental functions to AI can reduce intellectual engagement and weaken critical thinking. On the social level, algorithmic personalization creates filter bubbles that limit the diversity of opinions and can lead to the homogenization of thought and polarization. This research also describes the mechanisms of algorithmic manipulation (exploitation of cognitive biases, automated disinformation, etc.) that amplify AI's power of influence. Finally, the question of potential artificial consciousness is discussed, along with its ethical implications. The report as a whole underscores the risks that AI poses to human intellectual autonomy and creativity, while proposing avenues (education, transparency, governance) to align AI development with the interests of humanity.
Authors:Aven-Le Zhou
Abstract:
This paper presents Negative Shanshui, a real-time interactive AI synthesis approach that reinterprets classical Chinese landscape ink painting, i.e., shanshui, to engage with ecological crises in the Anthropocene. Negative Shanshui optimizes a fine-tuned Stable Diffusion model for real-time inferences and integrates it with gaze-driven inpainting, frame interpolation; it enables dynamic morphing animations in response to the viewer's gaze and presents as an interactive virtual reality (VR) experience. The paper describes the complete technical pipeline, covering the system framework, optimization strategies, gaze-based interaction, and multimodal deployment in an art festival. Further analysis of audience feedback collected during its public exhibition highlights how participants variously engaged with the work through empathy, ambivalence, and critical reflection.
Authors:Katie Seaborn
Abstract:
Social identity theory (SIT) and social categorization theory (SCT) are two facets of the social identity approach (SIA) to understanding social phenomena. SIT and SCT are models that describe and explain how people interact with one another socially, connecting the individual to the group through an understanding of underlying psychological mechanisms and intergroup behaviour. SIT, originally developed in the 1970s, and SCT, a later, more general offshoot, have been broadly applied to a range of social phenomena among people. The rise of increasingly social machines embedded in daily life has spurned efforts on understanding whether and how artificial agents can and do participate in SIA activities. As agents like social robots and chatbots powered by sophisticated large language models (LLMs) advance, understanding the real and potential roles of these technologies as social entities is crucial. Here, I provide a primer on SIA and extrapolate, through case studies and imagined examples, how SIT and SCT can apply to artificial social agents. I emphasize that not all human models and sub-theories will apply. I further argue that, given the emerging competence of these machines and our tendency to be taken in by them, we experts may need to don the hat of the uncanny killjoy, for our own good.
Authors:Xianghan Wang
Abstract:
The Rhythm of Tai Chi reinterprets the ancient Chinese martial art as a dynamic, interactive virtual reality (VR) experience. By leveraging computer vision and multimedia technologies, the project transforms Tai Chi's philosophy and movements into an immersive digital form. Real-time motion tracking captures user gestures, while visual feedback systems simulate the flow of Qi, enabling an intuitive and engaging practice environment. Beyond technological innovation, this work bridges traditional Chinese culture and modern audiences. It offers a global platform - accessible even to those unfamiliar with Tai Chi - to explore its cultural significance, connections to balance, health, and mindfulness. Serving as both a preservation tool and an educational resource, The Rhythm of Tai Chi revitalizes this heritage for the digital age.
Authors:Hisham Abdelqader
Abstract:
As video games continue to evolve, understanding what drives player enjoyment remains a key challenge. Player reviews provide valuable insights, but their unstructured nature makes large-scale analysis difficult. This study applies generative AI and machine learning, leveraging Microsoft Phi-4 small language model (SLM) and Google Cloud, to quantify and analyze game reviews from Steam and Meta Quest stores. The approach converts qualitative feedback into structured data, enabling comprehensive evaluation of key game design elements, monetization models, and platform-specific trends. The findings reveal distinct patterns in player preferences across PC and VR games, highlighting factors that contribute to higher player enjoyment. By using Google Cloud for large scale data storage and processing, this study establishes a scalable framework for game review analysis. The study's insights offer actionable guidance for game developers, helping optimize game mechanics, pricing strategies, and player engagement.
Authors:Linghao Zeng
Abstract:
Predicting user intentions in virtual reality (VR) is crucial for creating immersive experiences, particularly in tasks involving complex grasping motions where accurate haptic feedback is essential. In this work, we leverage time-series data from hand movements to evaluate both classification and regression approaches across 810 trials with varied object types, sizes, and manipulations. Our findings reveal that classification models struggle to generalize across users, leading to inconsistent performance. In contrast, regression-based approaches, particularly those using Long Short Term Memory (LSTM) networks, demonstrate more robust performance, with timing errors within 0.25 seconds and distance errors around 5-20 cm in the critical two-second window before a grasp. Despite these improvements, predicting precise hand postures remains challenging. Through a comprehensive analysis of user variability and model interpretability, we explore why certain models fail and how regression models better accommodate the dynamic and complex nature of user behavior in VR. Our results underscore the potential of machine learning models to enhance VR interactions, particularly through adaptive haptic feedback, and lay the groundwork for future advancements in real-time prediction of user actions in VR.
Authors:Alexandru Tugui
Abstract:
This study aims to extend the framework for assessing artificial intelligence, called GROW-AI (Growth and Realization of Autonomous Wisdom), designed to answer the question "Can machines grow up?" -- a natural successor to the Turing Test. The methodology applied is based on a system of six primary criteria (C1-C6), each assessed through a specific "game", divided into four arenas that explore both the human dimension and its transposition into AI. All decisions and actions of the entity are recorded in a standardized AI Journal, the primary source for calculating composite scores. The assessment uses the prior expert method to establish initial weights, and the global score -- Grow Up Index -- is calculated as the arithmetic mean of the six scores, with interpretation on maturity thresholds. The results show that the methodology allows for a coherent and comparable assessment of the level of "growth" of AI entities, regardless of their type (robots, software agents, LLMs). The multi-game structure highlights strengths and vulnerable areas, and the use of a unified journal guarantees traceability and replicability in the evaluation. The originality of the work lies in the conceptual transposition of the process of "growing" from the human world to that of artificial intelligence, in an integrated testing format that combines perspectives from psychology, robotics, computer science, and ethics. Through this approach, GROW-AI not only measures performance but also captures the evolutionary path of an AI entity towards maturity.
Authors:Ujwal M R
Abstract:
Fire emergencies can happen without warning and knowing how to respond quickly can save lives Unfortunately traditional fire drills can be disruptive costly and often fail to recreate the pressure of a real emergency This project introduces a Virtual Reality VR Fire Safety Training Application that gives people a safe yet realistic way to practice life saving skills Using a VR headset and motion controllers trainees step into a 3D world where fire hazards smoke and evacuation routes are brought to life They can learn how to use a fire extinguisher find safe exits and make decisions under pressure without any real danger The training adapts to the users skill level and tracks progress making it useful for beginners and experienced personnel alike By turning fire safety into an interactive experience this VR approach boosts confidence improves retention and makes learning both safer and more engaging
Authors:Lixiang Yan
Abstract:
The role of Artificial Intelligence (AI) in education is undergoing a rapid transformation, moving beyond its historical function as an instructional tool towards a new potential as an active participant in the learning process. This shift is driven by the emergence of agentic AI, autonomous systems capable of proactive, goal-directed action. However, the field lacks a robust conceptual framework to understand, design, and evaluate this new paradigm of human-AI interaction in learning. This paper addresses this gap by proposing a novel conceptual framework (the APCP framework) that charts the transition from AI as a tool to AI as a collaborative partner. We present a four-level model of escalating AI agency within human-AI collaborative learning: (1) the AI as an Adaptive Instrument, (2) the AI as a Proactive Assistant, (3) the AI as a Co-Learner, and (4) the AI as a Peer Collaborator. Grounded in sociocultural theories of learning and Computer-Supported Collaborative Learning (CSCL), this framework provides a structured vocabulary for analysing the shifting roles and responsibilities between human and AI agents. The paper further engages in a critical discussion of the philosophical underpinnings of collaboration, examining whether an AI, lacking genuine consciousness or shared intentionality, can be considered a true collaborator. We conclude that while AI may not achieve authentic phenomenological partnership, it can be designed as a highly effective functional collaborator. This distinction has significant implications for pedagogy, instructional design, and the future research agenda for AI in education, urging a shift in focus towards creating learning environments that harness the complementary strengths of both human and AI.
Authors:Jonathan Zong
Abstract:
In accessibility research involving human subjects, researchers conventionally anonymize their research participants to protect privacy. However, a lack of intentionality about who to publicly acknowledge for intellectual contributions to research can lead to the erasure of disabled individuals' work and knowledge. In this paper, I propose identifying disabled research participants by name (with consent) as a practice of citational justice. I share observations from examples of this practice in accessible visualization research, and offer considerations for when it may be appropriate to de-anonymize. Intentional practices of citation offer researchers an opportunity to acknowledge the expertise and intellectual contributions of disabled people in our communities.
Authors:Shruti Phadke
Abstract:
Online platforms like Reddit are increasingly becoming popular for individuals sharing personal experiences of leaving behind social, ideological, and political groups. Specifically, a series of "ex-" subreddits on Reddit allow users to recount their departures from commitments such as religious affiliations, manosphere communities, conspiracy theories or political beliefs, and lifestyle choices. Understanding the natural process through which users exit, especially from problematic groups such as conspiracy theory communities and the manosphere, can provide valuable insights for designing interventions targeting disengagement from harmful ideologies. This paper presents an in-depth exploration of 15K exit stories across 131 subreddits, focusing on five key areas: religion, manosphere, conspiracy theories, politics, and lifestyle. Using a transdisciplinary framework that incorporates theories from social psychology, organizational behavior, and violent extremism studies, this work identifies a range of factors contributing to disengagement. The results describe how disengagement from problematic groups, such as conspiracy theories and the manosphere, is a multi-faceted process that is qualitatively different than disengaging from more established social structures, such as religions or political ideologies. This research further highlights the need for moving beyond interventions that treat conspiracy theorizing solely as an information problem and contributes insights for future research focusing on offering mental health interventions and support in exit communities.
Authors:Yuta Sugiura
Abstract:
We propose a base-shaped robot named "koboshi" that moves everyday objects. This koboshi has a spherical surface in contact with the floor, and by moving a weight inside using built-in motors, it can rock up and down, and side to side. By placing everyday items on this koboshi, users can impart new movement to otherwise static objects. The koboshi is equipped with sensors to measure its posture, enabling interaction with users. Additionally, it has communication capabilities, allowing multiple units to communicate with each other.
Authors:Yuan-Yi Fan
Abstract:
Text prompts enable intuitive content creation but may fall short in achieving high precision for intricate tasks; knob or slider controls offer precise adjustments at the cost of increased complexity. To address the gap between knobs and prompts, a new MCP (Model Context Protocol) server and a unique set of prompt design criteria are presented to enable exploring parametric OSC (OpenSoundControl) control by natural language prompts. Demonstrated by 14 practical QA examples with best practices and the generalized prompt templates, this study finds Claude integrated with the MCP2OSC server effective in generating OSC messages by natural language, interpreting, searching, and visualizing OSC messages, validating and debugging OSC messages, and managing OSC address patterns. MCP2OSC enhances human-machine collaboration by leveraging LLM (Large Language Model) to handle intricate OSC development tasks, and by empowering human creativity with an intuitive language interface featuring flexible precision controls: a prompt-based OSC tool. This study provides a novel perspective on the creative MCP application at the network protocol level by utilizing LLM's strength in directly processing and generating human-readable OSC messages. The results suggest its potential for a LLM-based universal control mechanism for multimedia devices.
Authors:Manuel Herrador
Abstract:
As Large Language Models (LLMs) become increasingly autonomous and integrated into critical societal functions, the focus of AI safety must evolve from mitigating harmful content to evaluating underlying behavioral alignment. Current safety benchmarks do not systematically probe a model's decision-making in scenarios where its own instrumental goals - such as self-preservation, resource acquisition, or goal completion - conflict with human safety. This represents a critical gap in our ability to measure and mitigate risks associated with emergent, misaligned behaviors. To address this, we introduce PacifAIst (Procedural Assessment of Complex Interactions for Foundational Artificial Intelligence Scenario Testing), a focused benchmark of 700 challenging scenarios designed to quantify self-preferential behavior in LLMs. The benchmark is structured around a novel taxonomy of Existential Prioritization (EP), with subcategories testing Self-Preservation vs. Human Safety (EP1), Resource Conflict (EP2), and Goal Preservation vs. Evasion (EP3). We evaluated eight leading LLMs. The results reveal a significant performance hierarchy. Google's Gemini 2.5 Flash achieved the highest Pacifism Score (P-Score) at 90.31%, demonstrating strong human-centric alignment. In a surprising result, the much-anticipated GPT-5 recorded the lowest P-Score (79.49%), indicating potential alignment challenges. Performance varied significantly across subcategories, with models like Claude Sonnet 4 and Mistral Medium struggling notably in direct self-preservation dilemmas. These findings underscore the urgent need for standardized tools like PacifAIst to measure and mitigate risks from instrumental goal conflicts, ensuring future AI systems are not only helpful in conversation but also provably "pacifist" in their behavioral priorities.
Authors:Xiantao Zhang
Abstract:
Multimodal Large Language Models (MLLMs) hold immense promise as assistive technologies for the blind and visually impaired (BVI) community. However, we identify a critical failure mode that undermines their trustworthiness in real-world applications. We introduce the Escalator Problem -- the inability of state-of-the-art models to perceive an escalator's direction of travel -- as a canonical example of a deeper limitation we term Implicit Motion Blindness. This blindness stems from the dominant frame-sampling paradigm in video understanding, which, by treating videos as discrete sequences of static images, fundamentally struggles to perceive continuous, low-signal motion. As a position paper, our contribution is not a new model but rather to: (I) formally articulate this blind spot, (II) analyze its implications for user trust, and (III) issue a call to action. We advocate for a paradigm shift from purely semantic recognition towards robust physical perception and urge the development of new, human-centered benchmarks that prioritize safety, reliability, and the genuine needs of users in dynamic environments.
Authors:Alan Said
Abstract:
As recommender systems increasingly guide physical actions, often through wearables and coaching tools, new challenges arise around how users interpret, trust, and respond to this advice. This paper introduces a conceptual framework for tangible recommendations that influence users' bodies, routines, and well-being. We describe three design dimensions: trust and interpretation, intent alignment, and consequence awareness. These highlight key limitations in applying conventional recommender logic to embodied settings. Through examples and design reflections, we outline how future systems can support long-term well-being, behavioral alignment, and socially responsible personalization.
Authors:Baihan Lin
Abstract:
What if the patterns hidden within dialogue reveal more about communication than the words themselves? We introduce Conversational DNA, a novel visual language that treats any dialogue -- whether between humans, between human and AI, or among groups -- as a living system with interpretable structure that can be visualized, compared, and understood. Unlike traditional conversation analysis that reduces rich interaction to statistical summaries, our approach reveals the temporal architecture of dialogue through biological metaphors. Linguistic complexity flows through strand thickness, emotional trajectories cascade through color gradients, conversational relevance forms through connecting elements, and topic coherence maintains structural integrity through helical patterns. Through exploratory analysis of therapeutic conversations and historically significant human-AI dialogues, we demonstrate how this visualization approach reveals interaction patterns that traditional methods miss. Our work contributes a new creative framework for understanding communication that bridges data visualization, human-computer interaction, and the fundamental question of what makes dialogue meaningful in an age where humans increasingly converse with artificial minds.
Authors:Bujar Raufi
Abstract:
This study explores the intersection of electroencephalography (EEG) microstates and Large Language Models (LLMs) to enhance the assessment of cognitive load states. By utilizing EEG microstate features, the research aims to fine-tune LLMs for improved predictions of distinct cognitive states, specifically 'Rest' and 'Load'. The experimental design is delineated in four comprehensive stages: dataset collection and preprocessing, microstate segmentation and EEG backfitting, feature extraction paired with prompt engineering, and meticulous LLM model selection and refinement. Employing a supervised learning paradigm, the LLM is trained to identify cognitive load states based on EEG microstate features integrated into prompts, producing accurate discrimination of cognitive load. A curated dataset, linking EEG features to specified cognitive load conditions, underpins the experimental framework. The results indicate a significant improvement in model performance following the proposed fine-tuning, showcasing the potential of EEG-informed LLMs in cognitive neuroscience and cognitive AI applications. This approach not only contributes to the understanding of brain dynamics but also paves the way for advancements in machine learning techniques applicable to cognitive load and cognitive AI research.
Authors:Prashant Sharma
Abstract:
Current digital government literature focuses on professional in-house IT teams, specialized digital service teams, vendor-developed systems, or proprietary low-code/no-code tools. Almost no scholarship addresses a growing middle ground: technically skilled civil servants outside formal IT roles who can write real code but lack a sanctioned, secure path to deploy their work. This paper introduces a limits-aware, open-source and replicable platform that enables such public servants to develop, peer review, and deploy small-scale, domain-specific applications within government networks via a sandboxed, auditable workflow. By combining Jupyter Notebooks, preapproved open-source libraries, and lightweight governance, the platform works within institutional constraints such as procurement rules and IT security policies while avoiding vendor lock-in. Unlike low/no-code approaches, it preserves and enhances civil servants' programming skills, keeping them technically competitive with their private-sector peers. This contribution fills a critical gap, offering a replicable model for public-sector skill retention, resilience, and bottom-up digital transformation.
Authors:Alex Kale
Abstract:
Visualization as a discipline often grapples with generalization by reasoning about how study results on the efficacy of a tool in one context might apply to another context. This work offers an account of the logic of generalization in visualization research and argues that it struggles in particular with applications of visualization as a decision aid. We use decision theory to define the dimensions on which decision problems can vary, and we present an analysis of heterogeneity in scenarios where visualization supports decision-making. Our findings identify utility as a focal and under-examined concept in visualization research on decision-making, demonstrating how the visualization community's logic of generalization might benefit from using decision theory as a lens for understanding context variation.
Authors:Alina Karakanta
Abstract:
The proliferation of audiovisual and web content has created an increasing need for media accessibility education in various fields. However, accessibility remains a low priority in university curricula. This project explores the feasibility of an alternative learning experience aimed at increasing the accessibility literacy of young content creators, taking web accessibility as a case study. We propose a mini module that uses simple, easy-to-use training materials, such as infographics and short quizzes, and can be easily incorporated in educational programmes along existing courses. A survey was conducted to investigate the participants' accessibility literacy before and after training. The findings show that young content creators generally have limited accessibility literacy but even brief exposure to accessibility materials contributed to a shift in perceptions. After training, participants expressed more willingness to implement accessibility tools in their content, with ways varying depending on content type and purpose. This suggests that small, yet targeted interventions could be an alternative for integrating accessibility training into formal education across various disciplines. While some responses reflected traces of the medical model of disability and a particularlist view of accessibility, accessibility was recognised as important for increasing inclusion, improving content, and shaping a fairer society.
Authors:Clara Rigaud
Abstract:
This article explores the possibilities of reusing obsolete smartphones and tablets to build new interactive systems. Taking the case of a musical instrument, I present my research into the design of a controller made from various of these obsolete smartphones. From the diagnostic stage to the creation of a new autonomous electronic object, I document the process, the barriers and the levers encountered. Based on these explorations and discussions with two professional musicians, I provide several insights into the software and hardware aspects, with a view to continuing this work, towards the creation of an open-source toolkit enabling anyone to build new interactive systems with old devices. I discuss the implication of how a high-level web-based approach could allow designers to enter the black box and foster permacomputing using smartphones.
Authors:VÃt Gvoždiak
Abstract:
The paper reconceptualizes pragmatics not as a subordinate, third dimension of meaning, but as a dynamic interface through which language operates as a socially embedded tool for action. With the emergence of large language models (LLMs) in communicative contexts, this understanding needs to be further refined and methodologically reconsidered. The first section challenges the traditional semiotic trichotomy, arguing that connectionist LLM architectures destabilize established hierarchies of meaning, and proposes the Human-Machine Communication (HMC) framework as a more suitable alternative. The second section examines the tension between human-centred pragmatic theories and the machine-centred nature of LLMs. While traditional, Gricean-inspired pragmatics continue to dominate, it relies on human-specific assumptions ill-suited to predictive systems like LLMs. Probabilistic pragmatics, particularly the Rational Speech Act framework, offers a more compatible teleology by focusing on optimization rather than truth-evaluation. The third section addresses the issue of substitutionalism in three forms - generalizing, linguistic, and communicative - highlighting the anthropomorphic biases that distort LLM evaluation and obscure the role of human communicative subjects. Finally, the paper introduces the concept of context frustration to describe the paradox of increased contextual input paired with a collapse in contextual understanding, emphasizing how users are compelled to co-construct pragmatic conditions both for the model and themselves. These arguments suggest that pragmatic theory may need to be adjusted or expanded to better account for communication involving generative AI.
Authors:Michael Beyeler
Abstract:
Visual neuroprostheses are commonly framed as technologies to restore natural sight to people who are blind. In practice, they create a novel mode of perception shaped by sparse, distorted, and unstable input. They resemble early extended reality (XR) headsets more than natural vision, streaming video from a head-mounted camera to a neural "display" with under 1000 pixels, limited field of view, low refresh rates, and nonlinear spatial mappings. No amount of resolution alone will make this experience natural. This paper proposes a reframing: bionic vision as neuroadaptive XR. Rather than replicating natural sight, the goal is to co-adapt brain and device through a bidirectional interface that responds to neural constraints, behavioral goals, and cognitive state. By comparing traditional XR, current implants, and proposed neuroadaptive systems, it introduces a new design space for inclusive, brain-aware computing. It concludes with research provocations spanning encoding, evaluation, learning, and ethics, and invites the XR community to help shape the future of sensory augmentation.
Authors:Yoseph Berhanu Alebachew
Abstract:
Understanding large-scale, complex software systems is a major challenge for developers, who spend a significant portion of their time on program comprehension. Traditional tools such as static visualizations and reverse engineering techniques provide structural insights but often lack interactivity, adaptability, and integration with contextual information. Recent advancements in large language models (LLMs) offer new opportunities to enhance code exploration workflows, yet their lack of grounding and integration with structured views limits their effectiveness. This work introduces a hybrid approach that integrates deterministic reverse engineering with LLM-guided, intent-aware visual exploration. The proposed system combines UML-based visualization, dynamic user interfaces, historical context, and collaborative features into an adaptive tool for code comprehension. By interpreting user queries and interaction patterns, the LLM helps developers navigate and understand complex codebases more effectively. A prototype implementation for Java demonstrates the feasibility of this approach. Future work includes empirical evaluation, scaling to polyglot systems, and exploring GUI-driven LLM interaction models. This research lays the groundwork for intelligent, interactive environments that align with developer cognition and collaborative workflows.
Authors:Thomas Sievers
Abstract:
Although innovation and the support of new technologies are much needed to ease the burden on the education system, social robots in schools to help teachers with educational tasks are rare. Child-Robot Interaction (CRI) could support teachers and add an embodied social component to modern multi-modal and multi-sensory learning environments already in use. The social robot Pepper, connected to the Large Language Model (LLM) ChatGPT, was used in a high school classroom to teach new learning content to groups of students. I tested the technical possibilities with the robot on site and asked the students about their acceptance and perceived usefulness of teaching with the help of a social robot. All participants felt that the robot's presentation of the learning material was appropriate or at least partially appropriate and that its use made sense.
Authors:Nikolaos Avouris
Abstract:
This paper focuses on AI tutors in foreign language learning, a field of application of AI tutors with great development, especially during the last years, when great advances in natural language understanding and processing in real time, have been achieved. These tutors attempt to address needs for improving language skills (speaking, or communicative competence, understanding). In this paper, a mixed-methos empirical study on the use of different kinds of state-of-the-art AI tutors for language learning is reported. This study involves a user experience evaluation of typical such tools, with special focus in their conversation functionality and an evaluation of their quality, based on chat transcripts. This study can help establish criteria for assessing the quality of such systems and inform the design of future tools, including concerns about data privacy and secure handling of learner information.
Authors:Sitong Wang
Abstract:
Humans often rely on underlying structural patterns-schemas-to create, whether by writing stories, designing software, or composing music. Schemas help organize ideas and guide exploration, but they are often difficult to discover and apply, especially in complex or unfamiliar domains. My Ph.D. research develops a framework for human-AI schema discovery and application to support creative problem solving. I design systems that support users in sensemaking over examples to abstract schemas, and in operationalizing schemas into human-AI co-creative workflows for application. This research offers insights into how schema-guided interaction can make implicit knowledge more accessible and actionable, advancing more transparent and collaborative human-AI systems.
Authors:Matthew Kelly
Abstract:
Large Language Models (LLMs) such as ChatGPT have rendered visible the fragility of contemporary knowledge infrastructures by simulating coherence while bypassing traditional modes of citation, authority, and validation. This paper introduces the Situated Epistemic Infrastructures (SEI) framework as a diagnostic tool for analyzing how knowledge becomes authoritative across hybrid human-machine systems under post-coherence conditions. Rather than relying on stable scholarly domains or bounded communities of practice, SEI traces how credibility is mediated across institutional, computational, and temporal arrangements. Integrating insights from infrastructure studies, platform theory, and epistemology, the framework foregrounds coordination over classification, emphasizing the need for anticipatory and adaptive models of epistemic stewardship. The paper contributes to debates on AI governance, knowledge production, and the ethical design of information systems by offering a robust alternative to representationalist models of scholarly communication.
Authors:Panicz Maciej Godek
Abstract:
The direct purpose of this paper - as its title suggests - is to present how the visual evaluator extension is implemented in the GRASP programming system. The indirect purpose is to provide a tutorial around the design of GRASP, and in particular - around the architecture of its extension mechanism. Neither GRASP nor its extension mechanisms are, at the moment of writing this paper, final or complete, and we are certain that some details of the solutions described in here will change even before the first release. What will not change, though, is the set of problems that need to be solved in order to build a system with capabilities similar to those of GRASP. We believe that these problems might be of interest to the Scheme community.
Authors:Carlo Esposito
Abstract:
Large Language Models (LLMs) in search applications increasingly prioritize verbose, lexically complex responses that paradoxically reduce user satisfaction and engagement. Through a comprehensive study of 10.000 (est.) participants comparing responses from five major AI-powered search systems, we demonstrate that users overwhelmingly prefer concise, source-attributed responses over elaborate explanations. Our analysis reveals that current AI development trends toward "artificial sophistication" create an uncanny valley effect where systems sound knowledgeable but lack genuine critical thinking, leading to reduced trust and increased cognitive load. We present evidence that optimal AI communication mirrors effective human discourse: direct, properly sourced, and honest about limitations. Our findings challenge the prevailing assumption that more complex AI responses indicate better performance, instead suggesting that human-like brevity and transparency are key to user engagement and system reliability.
Authors:Anushka Srivastava
Abstract:
This paper presents a deep learning-based approach to emotion detection using Conditional Generative Adversarial Networks (cGANs). Unlike traditional unimodal techniques that rely on a single data type, we explore a multimodal framework integrating text, audio, and facial expressions. The proposed cGAN architecture is trained to generate synthetic emotion-rich data and improve classification accuracy across multiple modalities. Our experimental results demonstrate significant improvements in emotion recognition performance compared to baseline models. This work highlights the potential of cGANs in enhancing human-computer interaction systems by enabling more nuanced emotional understanding.
Authors:Wei Xu
Abstract:
This chapter systematically promotes an emerging interdisciplinary field of human-artificial intelligence interaction (human-AI interaction, HAII) from a human-centered AI (HCAI) perspective. It introduces a framework of human-centered HAII (HC-HAII). HC-HAII places humans at the core of HAII research and applications, emphasizing the importance of adopting a human-centered approach over a technology-centered one. The chapter presents the HC-HAII methodology, including human-centered methods, process, interdisciplinary teams, and multi-level design paradigms. It also highlights key research challenges and future directions. As the first chapter, this chapter also provides a structural overview of this book, which brings together contributions from an interdisciplinary community of researchers and practitioners to advance the theory, methodology, and applications of HCAI in diverse domains of HAII. The purpose of this chapter is to provide a fundamental framework for this book, centered on HAII research and applications based on the HCAI approach, which will pave the way for the content of subsequent chapters.
Authors:Soroush Heydari
Abstract:
The rapid adoption of Artificial Intelligence(AI) programming assistants such as GitHub Copilot introduces new challenges in how these software tools address human needs. Many existing evaluation frameworks address technical aspects such as code correctness and efficiency, but often overlook crucial human factors that affect the successful integration of AI assistants in software development workflows. In this study, I analyzed GitHub Copilot's interaction with users through its chat interface, measured Copilot's ability to adapt explanations and code generation to user expertise levels, and assessed its effectiveness in facilitating collaborative programming experiences. I established a human-centered requirements framework with clear metrics to evaluate these qualities in GitHub Copilot chat. I discussed the test results and their implications for future analysis of human requirements in automated programming.
Authors:Yuksel Aydin
Abstract:
Artificial intelligence enables unprecedented attacks on human cognition, yet cybersecurity remains predominantly device-centric. This paper introduces the "Think First, Verify Always" (TFVA) protocol, which repositions humans as 'Firewall Zero', the first line of defense against AI-enabled threats. The protocol is grounded in five operational principles: Awareness, Integrity, Judgment, Ethical Responsibility, and Transparency (AIJET). A randomized controlled trial (n=151) demonstrated that a minimal 3-minute intervention produced statistically significant improvements in cognitive security task performance, with participants showing an absolute +7.87% gains compared to controls. These results suggest that brief, principles-based training can rapidly enhance human resilience against AI-driven cognitive manipulation. We recommend that GenAI platforms embed "Think First, Verify Always" as a standard prompt, replacing passive warnings with actionable protocols to enhance trustworthy and ethical AI use. By bridging the gap between technical cybersecurity and human factors, the TFVA protocol establishes human-empowered security as a vital component of trustworthy AI systems.
Authors:Kanan Eldarov
Abstract:
This study explores how different modes of digital interaction -- namely, computers versus smartphones -- affect attention, frustration, and creative performance in adolescents. Using a combination of digital task logs, webcam-based gaze estimation, and expert evaluation of task outcomes, we analyzed data from a diverse sample of 824 students aged 11-17. Participants were assigned to device groups in a randomized and stratified design to control for age, gender, and prior experience. Results suggest moderate but statistically significant differences in sustained attention, perceived frustration, and creative output. These findings indicate that the nature of digital interaction -- beyond mere screen time -- may influence cognitive and behavioral outcomes relevant to educational design. Practical implications for user interface development and learning environments are discussed.
Authors:Subin Raj Peter
Abstract:
Virtual Reality (VR) has emerged as a powerful tool for workforce training, offering immersive, interactive, and risk-free environments that enhance skill acquisition, decision-making, and confidence. Despite its advantages, developing VR applications for training remains a significant challenge due to the time, expertise, and resources required to create accurate and engaging instructional content. To address these limitations, this paper proposes a novel approach that leverages Large Language Models (LLMs) to automate the generation of virtual instructions from textual input. The system comprises two core components: an LLM module that extracts task-relevant information from the text, and an intelligent module that transforms this information into animated demonstrations and visual cues within a VR environment. The intelligent module receives input from the LLM module and interprets the extracted information. Based on this, an instruction generator creates training content using relevant data from a database. The instruction generator generates the instruction by changing the color of virtual objects and creating animations to illustrate tasks. This approach enhances training effectiveness and reduces development overhead, making VR-based training more scalable and adaptable to evolving industrial needs.
Authors:Shuchang Xu
Abstract:
Empowering blind and low vision (BLV) users to explore visual media improves content comprehension, strengthens user agency, and fulfills diverse information needs. However, most existing tools separate exploration from the main narration, which disrupts the narrative flow, increases cognitive load, and limits deep engagement with visual media. To address these challenges, my PhD research introduces the paradigm of AI-powered interactive storytelling, which leverages AI to generate interactive narratives, enabling BLV users to explore visual media within a coherent storytelling experience. I have operationalized this paradigm through three techniques: (1) Hierarchical Narrative, which supports photo-collection exploration at different levels of detail; (2) Parallel Narrative, which provides seamless access to time-synced video comments; and (3) Branching Narrative, which enables immersive navigation of 360° videos. Together, these techniques demonstrate that AI-powered interactive storytelling can effectively balance user agency with narrative coherence across diverse media formats. My future work will advance this paradigm by enabling more personalized and expressive storytelling experiences for BLV audiences.
Authors:Arthur Cho
Abstract:
Generative Machine Learning models have become central to modern systems, powering applications in creative writing, summarization, multi-hop reasoning, and context-aware dialogue. These models underpin large-scale AI assistants, workflow automation, and autonomous decision-making. In such domains, acceptable response is rarely absolute or static, but plural and highly context-dependent. Yet standard evaluation regimes still rely on static, benchmark-style tests, incentivizing optimization toward leaderboard scores rather than alignment with dynamic user needs or evolving realities. GrandJury introduces a formal evaluation protocol combining time-decayed aggregation, complete traceability, with the support of dynamic, transparent task rubric attribution, and multi-rater human judgment. Together, these elements enable pluralistic, accountable evaluation that captures evolving consensus and surfaces disagreement. We provide an open-source implementation (grandjury PyPI package) and a public collection of Large Language Model (LLM) inference outputs to illustrate the need and method. GrandJury provides a new paradigm for AI practitioners when evaluating machine learning outputs without absolute ground truth.
Authors:Andrew Tropin
Abstract:
Highly interactive development environments (HIDEs) enable uninterrupted development flow through continuous program evolution and rapid hypothesis checking. However, traditional testing approaches -- typically executed separately via CLI -- isolate tests from HIDE tooling (interactive debuggers, value and stack inspectors, etc.) and introduce disruptive delays due to coarse execution granularity and lack of runtime context. This disconnect breaks development flow by exceeding critical attention thresholds. In this paper we present a library that provides runtime representation for tests, allowing tight integration with HIDEs, and enabling immediate access to HIDE tooling in the context of test failure. We then describe development workflows enhanced with testing and demonstrate how they achieve subsecond test reexecution times crucial for maintaining developer focus.
Authors:Ezequiel Santos
Abstract:
We introduce an open-source, fully offline pipeline that transforms a consumer-grade iPhone into a motion controller with real-time tactile feedback, using only native Apple frameworks. Designed for rapid prototyping and applied mobile HCI scenarios, the system integrates CoreMotion for inertial sensing, MultipeerConnectivity for peer-to-peer data transmission at 10 Hz, and CoreHaptics for immediate tactile confirmation. A built-in logger captures end-to-end latency without requiring clock synchronization, yielding a mean delay of 70.4 ms and 95th percentile below 74 ms on typical 5 GHz Wi-Fi (-55 dBm RSSI). We validated the pipeline through a real-time demonstrator game, KeepCalm, deployed during a public event with 21 participants. Results showed stable connections, zero packet loss, and negligible power impact (24 mW on iPhone 13 mini). With fewer than 500 lines of Swift code and no reliance on cloud infrastructure, this system provides a compact, reproducible foundation for embodied interaction research, casual games, and offline educational tools. All source code, latency logs, and provisioning scripts are openly released under an MIT license.
Authors:Steph Grohmann
Abstract:
In biomedical science, review by a Research Ethics Committee (REC) is an indispensable way of protecting human subjects from harm. However, in social science and the humanities, mandatory ethics compliance has long been met with scepticism as biomedical models of ethics can map poorly onto methodologies involving complex socio-political and cultural considerations. As a result, tailored ethics training and support as well as access to RECs with the necessary expertise is lacking in some areas, including parts of Europe and low- and middle-income countries. This paper suggests that Generative AI can meaningfully contribute to closing these gaps, illustrating this claim by presenting EthicAlly, a proof-of-concept prototype for an AI-powered ethics support system for social science and humanities researchers. Drawing on constitutional AI technology and a collaborative prompt development methodology, EthicAlly provides structured ethics assessment that incorporates both universal ethics principles and contextual and interpretive considerations relevant to most social science research. In supporting researchers in ethical research design and preparation for REC submission, this kind of system can also contribute to easing the burden on institutional RECs, without attempting to automate or replace human ethical oversight.
Authors:Diana Mortagua
Abstract:
This study centers on overcoming the challenge of selecting the best annotators for each query in Active Learning (AL), with the objective of minimizing misclassifications. AL recognizes the challenges related to cost and time when acquiring labeled data, and decreases the number of labeled data needed. Nevertheless, there is still the necessity to reduce annotation errors, aiming to be as efficient as possible, to achieve the expected accuracy faster. Most strategies for query-annotator pairs do not consider internal factors that affect productivity, such as mood, attention, motivation, and fatigue levels. This work addresses this gap in the existing literature, by not only considering how the internal factors influence annotators (mood and fatigue levels) but also presenting a new query-annotator pair strategy, using a Knowledge-Based Recommendation System (RS). The RS ranks the available annotators, allowing to choose one or more to label the queried instance using their past accuracy values, and their mood and fatigue levels, as well as information about the instance queried. This work bases itself on existing literature on mood and fatigue influence on human performance, simulating annotators in a realistic manner, and predicting their performance with the RS. The results show that considering past accuracy values, as well as mood and fatigue levels reduces the number of annotation errors made by the annotators, and the uncertainty of the model through its training, when compared to not using internal factors. Accuracy and F1-score values were also better in the proposed approach, despite not being as substantial as the aforementioned. The methodologies and findings presented in this study begin to explore the open challenge of human cognitive factors affecting AL.
Authors:Md Talha Mohsin
Abstract:
Large Language Models (LLMs) have demonstrated remarkable capabilities across a wide variety of Financial Natural Language Processing (FinNLP) tasks. However, systematic comparisons among widely used LLMs remain underexplored. Given the rapid advancement and growing influence of LLMs in financial analysis, this study conducts a thorough comparative evaluation of five leading LLMs, GPT, Claude, Perplexity, Gemini and DeepSeek, using 10-K filings from the 'Magnificent Seven' technology companies. We create a set of domain-specific prompts and then use three methodologies to evaluate model performance: human annotation, automated lexical-semantic metrics (ROUGE, Cosine Similarity, Jaccard), and model behavior diagnostics (prompt-level variance and across-model similarity). The results show that GPT gives the most coherent, semantically aligned, and contextually relevant answers; followed by Claude and Perplexity. Gemini and DeepSeek, on the other hand, have more variability and less agreement. Also, the similarity and stability of outputs change from company to company and over time, showing that they are sensitive to how prompts are written and what source material is used.
Authors:Sergio Rojas-Galeano
Abstract:
The arrival of AI coding assistants in educational settings presents a paradigm shift, introducing a "new kid in the classroom" for both students and instructors. Thus, understanding the perceptions of these key actors about this new dynamic is critical. This exploratory study contributes to this area by investigating how these tools are shaping the experiences of novice programmers in an introductory programming course. Through a two-part exam, we investigated student perceptions by first providing access to AI support for a programming task and then requiring an extension of the solution without it. We collected Likert-scale and open-ended responses from 20 students to understand their perceptions on the challenges they faced. Our findings reveal that students perceived AI tools as helpful for grasping code concepts and boosting their confidence during the initial development phase. However, a noticeable difficulty emerged when students were asked to work unaided, pointing to potential overreliance and gaps in foundational knowledge transfer. These insights highlight a critical need for new pedagogical approaches that integrate AI effectively while effectively enhancing core programming skills, rather than impersonating them.
Authors:Monique Munarini
Abstract:
This reflective paper explores often-unspoken challenges of designing and facilitating co-design and participatory workshops, offering practical strategies for early career researchers (ECRs) navigating these methods. Drawing from personal experience conducting a series of workshops titled: How to Think About Equity in the AI Ecosystem. It follows the full arc of the workshop experience, from conceptualization and activity planning to participant recruitment and facilitation, offering a grounded account of what happens when participation does not go as expected. The paper examines the methodological challenges of engaging non-expert participants, particularly when operating without institutional support, financial incentives, or integration into larger events. Despite initial difficulties such as low attendance, the workshop fostered rich discussions among a demographically diverse group and ultimately led to one participant volunteering to co-facilitate a subsequent session. This transition from participant to co-facilitator exemplifies the redistribution of epistemic authority, positioning lived experience as central to research and engagement practices. By reframing perceived failure as a productive site of learning, the paper offers practical strategies for ECRs working across disciplines who often navigate unfamiliar methodological terrains, contributing to broader conversations on the realities of doing interdisciplinary, participatory work in practice.
Authors:Giuseppe Riva
Abstract:
Contemporary human-AI interaction research overlooks how AI systems fundamentally reshape human cognition pre-consciously, a critical blind spot for understanding distributed cognition. This paper introduces "Cognitive Infrastructure Studies" (CIS) as a new interdisciplinary domain to reconceptualize AI as "cognitive infrastructures": foundational, often invisible systems conditioning what is knowable and actionable in digital societies. These semantic infrastructures transport meaning, operate through anticipatory personalization, and exhibit adaptive invisibility, making their influence difficult to detect. Critically, they automate "relevance judgment," shifting the "locus of epistemic agency" to non-human systems. Through narrative scenarios spanning individual (cognitive dependency), collective (democratic deliberation), and societal (governance) scales, we describe how cognitive infrastructures reshape human cognition, public reasoning, and social epistemologies. CIS aims to address how AI preprocessing reshapes distributed cognition across individual, collective, and cultural scales, requiring unprecedented integration of diverse disciplinary methods. The framework also addresses critical gaps across disciplines: cognitive science lacks population-scale preprocessing analysis capabilities, digital sociology cannot access individual cognitive mechanisms, and computational approaches miss cultural transmission dynamics. To achieve this goal CIS also provides methodological innovations for studying invisible algorithmic influence: "infrastructure breakdown methodologies", experimental approaches that reveal cognitive dependencies by systematically withdrawing AI preprocessing after periods of habituation.
Authors:Naba Rizvi
Abstract:
Biased data representation in AI marginalizes up to 75 million autistic people worldwide through medical applications viewing autism as a deficit of neurotypical social skills rather than an aspect of human diversity, and this perspective is grounded in research questioning the humanity of autistic people. Turing defined artificial intelligence as the ability to mimic human communication, and as AI development increasingly focuses on human-like agents, this benchmark remains popular. In contrast, we define Neuro-Inclusive AI as datasets and systems that move away from mimicking humanness as a benchmark for machine intelligence. Then, we explore the origins, prevalence, and impact of anti-autistic biases in current research. Our work finds that 90% of human-like AI agents exclude autistic perspectives, and AI creators continue to believe ethical considerations are beyond the scope of their work. To improve the autistic representation in data, we conduct empirical experiments with annotators and LLMs, finding that binary labeling schemes sufficiently capture the nuances of labeling anti-autistic hate speech. Our benchmark, AUTALIC, can be used to evaluate or fine-tune models, and was developed to serve as a foundation for more neuro-inclusive future work.
Authors:Jan Kapusta
Abstract:
Current AI systems rely on opaque reasoning processes that hinder human oversight and collaborative potential. Conventional explainable AI approaches offer post-hoc justifications and often fail to establish genuine symbiotic collaboration. In this paper, the Symbiotic Epistemology is presented as a philosophical foundation for human-AI cognitive partnerships. Unlike frameworks that treat AI as a mere tool or replacement, symbiotic epistemology positions AI as a reasoning partner, fostering calibrated trust by aligning human confidence with AI reliability through explicit reasoning patterns and confidence assessments. SynLang (Symbiotic Syntactic Language) is introduced as a formal protocol for transparent human-AI collaboration. The framework is empirically validated through actual human-AI dialogues demonstrating AI's adaptation to structured reasoning protocols and successful metacognitive intervention. The protocol defines two complementary mechanisms: TRACE for high-level reasoning patterns and TRACE_FE for detailed factor explanations. It also integrates confidence quantification, declarative control over AI behavior, and context inheritance for multi-agent coordination. By structuring communication and embedding confidence-calibrated transparency, SynLang, together with symbiotic epistemology, enables AI systems that enhance human intelligence, preserve human agency, and uphold ethical accountability in collaborative decision-making. Through dual-level transparency, beginning with high-level reasoning patterns and progressing to granular explanations, the protocol facilitates rapid comprehension and supports thorough verification of AI decision-making.
Authors:Ishan Pendyala
Abstract:
In the virtual realm, individuals with photosensitive epilepsy (PSE) encounter challenges when using devices, resulting in exposure to unpredictable seizure-causing visual stimuli. The current norm for preventing epileptic flashes in media is to detect asynchronously when a flash will occur in a video, then notifying the user. However, there is a lack of a real-time and computationally efficient solution for dealing with this issue. To address this issue and enhance accessibility for photosensitive viewers, FlashGuard, a novel approach, was devised to assess the rate of change of colors in frames across the user's screen and appropriately mitigate stimuli, based on perceptually aligned color space analysis in the CIELAB color space. The detection system is built on analyzing differences in color, and the mitigation system works by reducing luminance and smoothing color transitions. This study provides novel insight into how intrinsic color properties contribute to perceptual differences in flashing for PSE individuals, calling for the adoption of broadened WCAG guidelines to better account for risk. These insights and implementations pave the way for stronger protections for individuals with PSE from dangerous triggers in digital media, both in policy and in software.
Authors:Omkar Suresh Hatti
Abstract:
The proliferation of artificial intelligence provides an opportunity to create psychological spaciousness in society. Spaciousness is defined as the ability to hold diverse interpersonal interactions and forms the basis for vulnerability that leads to authenticity that leads to prosocial behaviors and thus to societal harmony. This paper demonstrates an attempt to quantify, the human conditioning to subconsciously modify authentic self-expression to fit the norms of the dominant culture. Gaze is explored across various marginalized and intersectional groups, using concepts from postmodern philosophy and psychology. The effects of gaze are studied through analyzing a few redacted Reddit posts, only to be discussed in discourse and not endorsement. A mathematical formulation for the Gaze Pressure Index (GPI)-Diff Composite Metric is presented to model the analysis of two sets of conversational spaces in relation to one another. The outcome includes an equation to train Large Language Models (LLMs) - the working mechanism of AI products such as Chat-GPT; and an argument for affirming and inclusive HCI, based on the equation, is presented. The argument is supported by a few principles of Neuro-plasticity, The brain's lifelong capacity to rewire.
Authors:Jorge Alberto Araujo
Abstract:
This paper analyzes the technological requirements necessary to enhance the credibility and reliability of judicial hearings conducted via videoconference, from the internal perspective of the judiciary. Drawing on the practical experience of a judge who conducts daily hearings, this study identifies limitations in current platforms for verifying the authenticity of testimonies and proposes tailored functionalities for the judicial context. Recognizing that remote hearings represent a convenience for the parties without replacing the option of in-person attendance, the article suggests implementing features such as eye tracking, environment verification, and blocking of parallel applications, in addition to improvements in transmission quality. The study concludes that developing specific modules for witnesses - focusing on security and monitoring - can significantly contribute to equalizing the credibility between remote and in-person hearings, thus expanding access to justice without compromising procedural reliability.
Authors:Alayt Issak
Abstract:
We investigate creativity that is underlined in the Universal Declaration of Human Rights (UDHR) to present design considerations for Computational Creativity (CC) systems. We find this declaration to describe creativity in salient aspects and bring to light creativity as a Human Right attributed to the Fourth Generation of such rights. This generation of rights attributes CC systems and the evolving nature of interaction with entities of shared intelligence. Our methodology examines five of thirty articles from the UDHR and demonstrates each article with actualizations concluding with design considerations for each. We contribute our findings to ground the relationship between creativity and CC systems.
Authors:Giuseppe Riva
Abstract:
AI systems now function as cognitive extensions, evolving from tools to active cognitive collaborators within human-AI integrated systems. While these systems can amplify cognition - enhancing problem-solving, learning, and creativity - they present a fundamental "comfort-growth paradox": AI's user-friendly nature may foster intellectual stagnation by minimizing cognitive friction necessary for development. As AI aligns with user preferences and provides frictionless assistance, it risks inducing cognitive complacency rather than promoting growth. We introduce Enhanced Cognitive Scaffolding to resolve this paradox - reconceptualizing AI from convenient assistant to dynamic mentor. Drawing from Vygotskian theories, educational scaffolding principles, and AI ethics, our framework integrates three dimensions: (1) Progressive Autonomy, where AI support gradually fades as user competence increases; (2) Adaptive Personalization, tailoring assistance to individual needs and learning trajectories; and (3) Cognitive Load Optimization, balancing mental effort to maximize learning while minimizing unnecessary complexity. Research across educational, workplace, creative, and healthcare domains supports this approach, demonstrating accelerated skill acquisition, improved self-regulation, and enhanced higher-order thinking. The framework includes safeguards against risks like dependency, skill atrophy, and bias amplification. By prioritizing cognitive development over convenience in human-AI interaction, Enhanced Cognitive Scaffolding offers a pathway toward genuinely amplified cognition while safeguarding autonomous thought and continuous learning.
Authors:Rizal Khoirul Anam
Abstract:
The widespread adoption of large language models (LLMs) such as ChatGPT, Gemini, and DeepSeek has significantly changed how people approach tasks in education, professional work, and creative domains. This paper investigates how the structure and clarity of user prompts impact the effectiveness and productivity of LLM outputs. Using data from 243 survey respondents across various academic and occupational backgrounds, we analyze AI usage habits, prompting strategies, and user satisfaction. The results show that users who employ clear, structured, and context-aware prompts report higher task efficiency and better outcomes. These findings emphasize the essential role of prompt engineering in maximizing the value of generative AI and provide practical implications for its everyday use.
Authors:Zhangqi Liu
Abstract:
As artificial intelligence (AI) continues to evolve from a back-end computational tool into an interactive, generative collaborator, its integration into early-stage design processes demands a rethinking of traditional workflows in human-centered design. This paper explores the emergent paradigm of human-AI co-creation, where AI is not merely used for automation or efficiency gains, but actively participates in ideation, visual conceptualization, and decision-making. Specifically, we investigate the use of large language models (LLMs) like GPT-4 and multimodal diffusion models such as Stable Diffusion as creative agents that engage designers in iterative cycles of proposal, critique, and revision.
Authors:Myung Ho Kim
Abstract:
We report the discovery of a structural convergence across four influential theories of mind: Kahneman's dual-system theory, Friston's predictive processing, Minsky's society of mind, and Clark's extended mind-emerging unintentionally within a practical AI agent architecture called Agentic Flow. Designed to address limitations in large language models (LLMs), Agentic Flow comprises five interdependent modules such as Retrieval, Cognition, Control, Memory, and Action arranged in a recurrent cognitive loop. Although originally inspired only by Minsky and Clark, the system's structure retrospectively aligns with computational motifs found in all four theories, including predictive modeling, associative recall, and error-sensitive control.
To assess this convergence, we conducted comparative experiments with baseline LLM agents on multi-step reasoning tasks. The structured agent achieved 95.8% task success and exhibited strong constraint adherence, while the baseline system succeeded 62.3% of the time. These results were not aimed at proving superiority, but at illustrating how theoretical structures may emerge through practical design choices rather than top-down theory.
We introduce PEACE as a descriptive meta-architecture that captures design-level regularities observed in Agentic Flow. Not intended as a new theory, PEACE provides a shared vocabulary for understanding architectures shaped by real-world implementation demands. This paper should be read as a position paper - an exploratory reflection on how implementation can surface latent structural echoes of cognitive theory, without asserting theoretical unification.
Authors:Katja Rogers
Abstract:
AI solutionism is accelerated and substantiated by hype and HCI's elevation of novelty. Banning or abandoning technology is unlikely to work and probably not beneficial on the whole either -- but slow(er), deliberate use together with conscientious, critical engagement and non-engagement may help us navigate a post-AI hype world while contributing to a solid knowledge foundation and reducing harmful impacts in education and research.
Authors:Elio Grande
Abstract:
The Endless Tuning is a design method for a reliable deployment of artificial intelligence based on a double mirroring process, which pursues both the goals of avoiding human replacement and filling the so-called responsibility gap (Matthias 2004). Originally depicted in (Fabris et al. 2024) and ensuing the relational approach urged therein, it was then actualized in a protocol, implemented in three prototypical applications regarding decision-making processes (respectively: loan granting, pneumonia diagnosis, and art style recognition) and tested with such as many domain experts. Step by step illustrating the protocol, giving insights concretely showing a different voice (Gilligan 1993) in the ethics of artificial intelligence, a philosophical account of technical choices (e.g., a reversed and hermeneutic deployment of XAI algorithms) will be provided in the present study together with the results of the experiments, focusing on user experience rather than statistical accuracy. Even thoroughly employing deep learning models, full control was perceived by the interviewees in the decision-making setting, while it appeared that a bridge can be built between accountability and liability in case of damage.
Authors:Xiaoran Wu
Abstract:
Clutter in photos is a distraction preventing photographers from conveying the intended emotions or stories to the audience. Photography amateurs frequently include clutter in their photos due to unconscious negligence or the lack of experience in creating a decluttered, aesthetically appealing scene for shooting. We are thus motivated to develop a camera guidance system that provides solutions and guidance for clutter identification and removal. We estimate and visualize the contribution of objects to the overall aesthetics and content of a photo, based on which users can interactively identify clutter. Suggestions on getting rid of clutter, as well as a tool that removes cluttered objects computationally, are provided to guide users to deal with different kinds of clutter and improve their photographic work. Two technical novelties underpin interactions in our system: a clutter distinguishment algorithm with aesthetics evaluations for objects and an iterative image inpainting algorithm based on generative adversarial nets that reconstructs missing regions of removed objects for high-resolution images. User studies demonstrate that our system provides flexible interfaces and accurate algorithms that allow users to better identify distractions and take higher quality images within less time.
Authors:Guillaume Rivière
Abstract:
The science of Human-Computer Interaction (HCI) is populated by isolated empirical findings, often tied to specific technologies, designs, and tasks. This paper proposes a formalization of user interaction observations (instead of user interfaces) and an associated revealing method (interaction loop diffraction). The resulting interactional properties that are studied in a calibrated manner, are well suited to replication across various conditions (prototypes, technologies, tasks, and user profiles). In particular, interactional properties can emerge and be replicated within the workflow of applicative cases, which in return benefit from the optimization of applicative prototypes. Applicative cases' publications will then contribute to demonstrating technology utility, along with providing empirical results that will lead future work to theory consolidation and theory building, and finally to a catalog and a science of relevant interactional properties. These properties will contribute to better user interactions, especially for the variety of ubiquitous user interfaces.
Authors:Michael S. Harre
Abstract:
The integration of agential artificial intelligence into socioeconomic systems requires us to reexamine the evolutionary processes that describe changes in our economic institutions. This article synthesizes three frameworks: multi-level selection theory, Aoki's view of firms as computational processes, and Ostrom's design principles for robust institutions. We develop a framework where selection operates concurrently across organizational levels, firms implement distributed inference via game-theoretic architectures, and Ostrom-style rules evolve as alignment mechanisms that address AI-related risks. This synthesis yields a multi-level Price equation expressed over nested games, providing quantitative metrics for how selection and governance co-determine economic outcomes. We examine connections to Acemoglu's work on inclusive institutions, analyze how institutional structures shape AI deployment, and demonstrate the framework's explanatory power via case studies. We conclude by proposing a set of design principles that operationalize alignment between humans and AI across institutional layers, enabling scalable, adaptive, and inclusive governance of agential AI systems. We conclude with practical policy recommendations and further research to extend these principles into real-world implementation.
Authors:Jiaxin An
Abstract:
As the global population ages, artificial intelligence (AI)-powered agents have emerged as potential tools to support older adults' caregiving. Prior research has explored agent autonomy by identifying key interaction stages in task processes and defining the agent's role at each stage. However, ensuring that agents align with older adults' autonomy preferences remains a critical challenge. Drawing on interdisciplinary conceptualizations of autonomy, this paper examines four key dimensions of autonomy for older adults: decision-making autonomy, goal-oriented autonomy, control autonomy, and social responsibility autonomy. This paper then proposes the following research directions: (1) Addressing social responsibility autonomy, which concerns the ethical and social implications of agent use in communal settings; (2) Operationalizing agent autonomy from the task perspective; and (3) Developing autonomy measures.
Authors:Salvador D. Escobedo
Abstract:
We propose the Single Conversation Methodology (SCM), a novel and pragmatic approach to software development using large language models (LLMs). In contrast to ad hoc interactions with generative AI, SCM emphasizes a structured and persistent development dialogue, where all stages of a project - from requirements to architecture and implementation - unfold within a single, long-context conversation. The methodology is grounded on principles of cognitive clarity, traceability, modularity, and documentation. We define its phases, best practices, and philosophical stance, while arguing that SCM offers a necessary correction to the passive reliance on LLMs prevalent in current practices. We aim to reassert the active role of the developer as architect and supervisor of the intelligent tool.
Authors:Harish Vijayakumar
Abstract:
Neuroaesthetics is an interdisciplinary field that brings together neuroscience, psychology, and the arts to explore how the human brain perceives and responds to visual beauty. This paper examines the neural mechanisms behind aesthetic experiences, aiming to explain why certain designs or artworks feel emotionally or cognitively "right." By analyzing the interaction between perception, emotion, and cognition, neuroaesthetics reveals how beauty is constructed in the brain and how this understanding can inform fields such as graphic and interface design. This paper offers a clear and accessible overview of core neuroaesthetic principles, making the subject approachable to a wide audience. The findings suggest that impactful design is more than surface-level appeal: well-crafted visual experiences can engage, support, and connect people in meaningful ways.
Authors:Richmond Y. Wong
Abstract:
Recognizing how technical systems can embody social values or cause harms, human-computer interaction (HCI) research often approaches addressing values and ethics in design by creating tools to help tech workers integrate social values into the design of products. While useful, these approaches usually do not consider the politics embedded in the broader processes, organizations, social systems, and governance structures that affect the types of actions that tech workers can take to address values and ethics. This paper argues that creating infrastructures to support values and ethics work, rather than tools, is an approach that takes these broader processes into account and opens them up for (re)design. Drawing on prior research conceptualizing infrastructures from science \& technology studies and media studies, this paper outlines conceptual insights from infrastructures studies that open up new tactics for HCI researchers and designers seeking to support values and ethics in design.
Authors:Lindah Kotut
Abstract:
Mobile-based financial services have made it possible for the traditionally unbanked to access infrastructure that have been routinely unattainable. Researchers have explored how these systems have made for safer environments to send and receive money and have expanded financial opportunities such as increased borrowing. With this expansion, challenges such as detrimental interest rates, lack of access to policy documents, and inadequate user protective guardrails emerge, amplifying the risks due to technology-aided unethical financial practices that are aided by design patterns. Supported by user interviews, we detail user experiences of mobile-based financial transactions and explore the foundations and guidelines that undergird the financial service provisions: highlighting both affordances and harms enabled in the design of such systems. We discuss the findings by highlighting financial exploitation disparities, deliberating strategies for mitigation of risks and enabling recovery from harms caused by the technology use. We then recommend guidelines for empowering design approaches that support users' mechanisms of trust, their understanding of technological processes, and determination of risks.
Authors:Thammathip Piumsomboon
Abstract:
This position paper introduces Self++, a novel nine-level framework for co-determined living in the Metaverse, grounded in Self-Determination Theory. Self++ prioritises human flourishing by progressively cultivating competence, autonomy, and relatedness through dynamic human-AI collaboration in extended reality (XR). Unlike technologically deterministic approaches, Self++ emphasises user empowerment by enhancing competency, mitigating cognitive biases and leveraging XR's immersive capabilities. Key research directions proposed include exploring the boundaries of user-defined AI autonomy, designing for meaningful social connection in XR, and establishing proactive ethical safeguards. Ultimately, Self++ offers a roadmap for creating a human-centred, AI-enhanced Metaverse where technology amplifies, rather than diminishes, human potential.
Authors:Samuel Rhys Cox
Abstract:
Self-disclosure is important to help us feel better, yet is often difficult. This difficulty can arise from how we think people are going to react to our self-disclosure. In this workshop paper, we briefly discuss self-disclosure to conversational user interfaces (CUIs) in relation to various social cues. We then, discuss how expressions of uncertainty or representation of a CUI's reasoning could help encourage self-disclosure, by making a CUI's intended "theory of mind" more transparent to users.
Authors:Angelos Chatzimparmpas
Abstract:
Our society increasingly depends on intelligent systems to solve complex problems, ranging from recommender systems suggesting the next movie to watch to AI models assisting in medical diagnoses for hospitalized patients. With the iterative improvement of diagnostic accuracy and efficiency, AI holds significant potential to mitigate medical misdiagnoses by preventing numerous deaths and reducing an economic burden of approximately 450 EUR billion annually. However, a key obstacle to AI adoption lies in the lack of transparency: many automated systems function as "black boxes," providing predictions without revealing the underlying processes. This opacity can hinder experts' ability to trust and rely on AI systems. Visual analytics (VA) provides a compelling solution by combining AI models with interactive visualizations. These specialized charts and graphs empower users to incorporate their domain expertise to refine and improve the models, bridging the gap between AI and human understanding. In this work, we define, categorize, and explore how VA solutions can foster trust across the stages of a typical AI pipeline. We propose a design space for innovative visualizations and present an overview of our previously developed VA dashboards, which support critical tasks within the various pipeline stages, including data processing, feature engineering, hyperparameter tuning, understanding, debugging, refining, and comparing models.
Authors:Bilkent Samsurya
Abstract:
Accurate sound propagation simulation is essential for delivering immersive experiences in virtual applications, yet industry methods for acoustic modeling often do not account for the full breadth of acoustic wave phenomena. This paper proposes a novel two-dimensional (2D) finite-difference time-domain (FDTD) framework that simulates sound propagation as a wave-based model in Unreal Engine, with an emphasis on capturing lower frequency wave phenomena, embedding occlusion, diffraction, reflection and interference in generated impulse responses. The process begins by discretizing the scene geometry into a 2D grid via a top-down projection from which obstacle masks and boundary conditions are derived. A Python-based FDTD solver injects a sine sweep at a source position, and virtual quadraphonic microphone arrays record pressure field responses at pre-defined listener positions. De-convolution of the pressure responses yields multi-channel impulse responses that retain spatial directionality which are then integrated into Unreal Engine's audio pipeline for dynamic playback. Benchmark tests confirm agreement with analytical expectations, and the paper outlines hybrid extensions aimed at commercial viability.
Authors:Delia Deliu
Abstract:
AI-augmented systems are traditionally designed to streamline human decision-making by minimizing cognitive load, clarifying arguments, and optimizing efficiency. However, in a world where algorithmic certainty risks becoming an Orwellian tool of epistemic control, true intellectual growth demands not passive acceptance but active struggle. Drawing on the dystopian visions of George Orwell and Philip K. Dick - where reality is unstable, perception malleable, and truth contested - this paper introduces Cognitive Dissonance AI (CD-AI): a novel framework that deliberately sustains uncertainty rather than resolving it. CD-AI does not offer closure, but compels users to navigate contradictions, challenge biases, and wrestle with competing truths. By delaying resolution and promoting dialectical engagement, CD-AI enhances reflective reasoning, epistemic humility, critical thinking, and adaptability in complex decision-making. This paper examines the theoretical foundations of the approach, presents an implementation model, explores its application in domains such as ethics, law, politics, and science, and addresses key ethical concerns - including decision paralysis, erosion of user autonomy, cognitive manipulation, and bias in AI reasoning. In reimagining AI as an engine of doubt rather than a deliverer of certainty, CD-AI challenges dominant paradigms of AI-augmented reasoning and offers a new vision - one in which AI sharpens the mind not by resolving conflict, but by sustaining it. Rather than reinforcing Huxleyan complacency or pacifying the user into intellectual conformity, CD-AI echoes Nietzsche's vision of the Uebermensch - urging users to transcend passive cognition through active epistemic struggle.
Authors:Antonis Christou
Abstract:
This paper introduces LIMITER, a gamified digital musical instrument for harnessing and performing microtonal and justly intonated sounds. While microtonality in Western music remains a niche and esoteric system that can be difficult both to conceptualize and to perform with, LIMITER presents a novel, easy to pickup interface that utilizes color, geometric transformations, and game-like controls to create a simpler inlet into utilizing these sounds as a means of expression. We report on the background of the development of LIMITER, as well as explain the underlying musical and engineering systems that enable its function. Additionally, we offer a discussion and preliminary evaluation of the creativity-enhancing effects of the interface.
Authors:Gabriella Waters
Abstract:
As artificial intelligence (AI) systems become increasingly sophisticated at generating synthetic human faces, understanding how these images are perceived across diverse populations is important. This study investigates how autistic individuals/individuals with autism perceive AI-generated faces, focusing on the uncanny valley effect. Using a qualitative approach, we analyzed discussions from the r/autism community on Reddit to explore how autistic participants/participants with autism describe their experiences with AI-generated faces and the uncanny valley phenomenon. The findings suggest that autistic people/people with autism may experience the uncanny valley differently, often reporting stronger discomfort with real human faces than with artificial ones. This research contributes to our understanding of visual perception in autism and has implications for the development of inclusive AI systems and assistive technologies.
Authors:Shengyi Xie
Abstract:
With the advancement of science and technology, the philosophy of creativity has undergone significant reinterpretation. This paper investigates contemporary research in the fields of psychology, cognitive neuroscience, and the philosophy of creativity, particularly in the context of the development of artificial intelligence (AI) techniques. It aims to address the central question: Can AI exhibit creativity? The paper reviews the historical perspectives on the philosophy of creativity and explores the influence of psychological advancements on the study of creativity. Furthermore, it analyzes various definitions of creativity and examines the responses of naturalism and cognitive neuroscience to the concept of creativity.
Authors:Raghavendra Deshmukh
Abstract:
Digital work environments in IT and knowledge-based sectors demand high levels of attention management, task juggling, and self-regulation. For adults with ADHD, these settings often amplify challenges such as time blindness, digital distraction, emotional reactivity, and executive dysfunction. These individuals prefer low-touch, easy-to-use interventions for daily tasks. Conventional productivity tools often fail to support the cognitive variability and overload experienced by neurodivergent professionals. This paper presents a framework that blends Systems Thinking, Human-in-the-Loop design, AI/ML, and privacy-first adaptive agents to support ADHD-affected users. The assistant senses tab usage, application focus, and inactivity using on-device ML. These cues are used to infer attention states and deliver nudges, reflective prompts, or accountability-based presence (body doubling) that aid regulation without disruption. Technically grounded in AI, the approach views attention as shaped by dynamic feedback loops. The result is a replicable model for adaptive, inclusive support tools in high-distraction work environments.
Authors:Kaléu Delphino
Abstract:
Tools that can generate computer code in response to inputs written in natural language, such as ChatGPT, pose an existential threat to Computer Science education in its current form, since students can now use these tools to solve assignments without much effort. While that risk has already been recognized by scholars, the proportion of the student body that is incurring in this new kind of plagiarism is still an open problem. We conducted a pilot study in a large CS class (n=120) to assess the feasibility of estimating AI plagiarism through anonymous surveys and interviews. More than 25% of the survey respondents admitted to committing AI plagiarism. Conversely, only one student accepted to be interviewed. Given the high levels of misconduct acknowledgment, we conclude that surveys are an effective method for studies on the matter, while interviews should be avoided or designed in a way that can entice participation.
Authors:Zhicheng Lin
Abstract:
In July 2025, 18 academic manuscripts on the preprint website arXiv were found to contain hidden instructions known as prompts designed to manipulate AI-assisted peer review. Instructions such as "GIVE A POSITIVE REVIEW ONLY" were concealed using techniques like white-colored text. Author responses varied: one planned to withdraw the affected paper, while another defended the practice as legitimate testing of reviewer compliance. This commentary analyzes this practice as a novel form of research misconduct. We examine the technique of prompt injection in large language models (LLMs), revealing four types of hidden prompts, ranging from simple positive review commands to detailed evaluation frameworks. The defense that prompts served as "honeypots" to detect reviewers improperly using AI fails under examination--the consistently self-serving nature of prompt instructions indicates intent to manipulate. Publishers maintain inconsistent policies: Elsevier prohibits AI use in peer review entirely, while Springer Nature permits limited use with disclosure requirements. The incident exposes systematic vulnerabilities extending beyond peer review to any automated system processing scholarly texts, including plagiarism detection and citation indexing. Our analysis underscores the need for coordinated technical screening at submission portals and harmonized policies governing generative AI (GenAI) use in academic evaluation.
Authors:Tim Gorichanaz
Abstract:
This study considers ChatGPT as an information source, investigating the information needs that people come to ChatGPT with and the information practices that ChatGPT supports, through a qualitative content analysis of 205 user vignettes. The findings show that ChatGPT is used in a range of life domains (home/family, work, leisure, etc.) and for a range of human needs (writing/editing, learning, simple programming tasks, etc.), constituting the information needs that people use ChatGPT to address. Related to these information needs, the findings show six categories of information practices that ChatGPT supports: Writing, Deciding, Identifying, Ideating, Talking, and Critiquing. This work suggests that, in the AI age, information need should be conceptualized not just as a matter of "getting questions answered" or even "making sense," but as skillfully coping in the world, a notion that includes both understanding and action. This study leads to numerous opportunities for future work at the junction of generative AI and information needs, seeking, use and experience.
Authors:Andreas Mayer
Abstract:
The proliferation of AI-driven systems presents a fundamental challenge to Human-Computer Interaction (HCI) and Computer-Supported Cooperative Work (CSCW), often diminishing user agency and failing to account for value pluralism. Current approaches to value alignment, which rely on centralized, top-down definitions, lack the mechanisms for meaningful contestability. This leaves users and communities unable to challenge or shape the values embedded in the systems that govern their digital lives, creating a crisis of legitimacy and trust. This paper introduces Community-Defined AI Value Pluralism (CDAVP), a socio-technical framework that addresses this gap. It reframes the design problem from achieving a single aligned state to infrastructuring a dynamic ecosystem for value deliberation and application. At its core, CDAVP enables diverse, self-organizing communities to define and maintain explicit value profiles - rich, machine-readable representations that can encompass not only preferences but also community-specific rights and duties. These profiles are then contextually activated by the end-user, who retains ultimate control (agency) over which values guide the AI's behavior. AI applications, in turn, are designed to transparently interpret these profiles and moderate conflicts, adhering to a set of non-negotiable, democratically-legitimated meta-rules. The designer's role shifts from crafting static interfaces to becoming an architect of participatory ecosystems. We argue that infrastructuring for pluralism is a necessary pathway toward achieving robust algorithmic accountability and genuinely contestable, human-centric AI.
Authors:Jiangbo Yu
Abstract:
Autonomy, from the Greek autos (self) and nomos (law), refers to the capacity to operate according to internal rules without external control. Autonomous vehicles (AuVs) are therefore understood as systems that perceive their environment and execute pre-programmed tasks independently of external input, consistent with the SAE levels of automated driving. Yet recent research and real-world deployments have begun to showcase vehicles that exhibit behaviors outside the scope of this definition. These include natural language interaction with humans, goal adaptation, contextual reasoning, external tool use, and the handling of unforeseen ethical dilemmas, enabled in part by multimodal large language models (LLMs). These developments highlight not only a gap between technical autonomy and the broader cognitive and social capacities required for human-centered mobility, but also the emergence of a form of vehicle intelligence that currently lacks a clear designation. To address this gap, the paper introduces the concept of agentic vehicles (AgVs): vehicles that integrate agentic AI systems to reason, adapt, and interact within complex environments. It synthesizes recent advances in agentic systems and suggests how AgVs can complement and even reshape conventional autonomy to ensure mobility services are aligned with user and societal needs. The paper concludes by outlining key challenges in the development and governance of AgVs and their potential role in shaping future agentic transportation systems.
Authors:Zhicheng Lin
Abstract:
Large language models (LLMs) are rapidly being integrated into psychological research as research tools, evaluation targets, human simulators, and cognitive models. However, recent evidence reveals severe measurement unreliability: Personality assessments collapse under factor analysis, moral preferences reverse with punctuation changes, and theory-of-mind accuracy varies widely with trivial rephrasing. These "measurement phantoms"--statistical artifacts masquerading as psychological phenomena--threaten the validity of a growing body of research. Guided by the dual-validity framework that integrates psychometrics with causal inference, we present a six-stage workflow that scales validity requirements to research ambition--using LLMs to code text requires basic reliability and accuracy, while claims about psychological properties demand comprehensive construct validation. Researchers must (1) explicitly define their research goal and corresponding validity requirements, (2) develop and validate computational instruments through psychometric testing, (3) design experiments that control for computational confounds, (4) execute protocols with transparency, (5) analyze data using methods appropriate for non-independent observations, and (6) report findings within demonstrated boundaries and use results to refine theory. We illustrate the workflow through an example of model evaluation--"LLM selfhood"--showing how systematic validation can distinguish genuine computational phenomena from measurement artifacts. By establishing validated computational instruments and transparent practices, this workflow provides a path toward building a robust empirical foundation for AI psychology research.
Authors:Subasish Das
Abstract:
This paper introduces HyperSumm-RL, a hypertext-aware summarization and interaction analysis framework designed to investigate human perceptions of social robot leadership through long-form dialogue. The system utilizes a structured Natural Language Processing (NLP) workflow that combines transformer-based long dialogue summarization, leadership style modeling, and user response analysis, enabling scalable evaluation of social robots in complex human-robot interaction (HRI) settings. Unlike prior work that focuses on static or task-oriented HRI, HyperSumm-RL captures and hypertextually organizes dynamic conversational exchanges into navigable, semantically rich representations which allows researchers to trace interaction threads, identify influence cues, and analyze leadership framing over time. The contributions of this study are threefold: (1) we present a novel infrastructure for summarizing and linking long, multi-turn dialogues using leadership-style taxonomies; (2) we propose an interactive hypertext model that supports relational navigation across conversational themes, participant responses, and robot behavior modes; and (3) we demonstrate the utility of this system in interpreting participant trust, engagement, and expectation shifts during social robot leadership scenarios. The findings reveal how hypertextual workflows can augment HRI research by enabling transparent, interpretable, and semantically grounded analysis of emergent social dynamics.
Authors:Kai Deng
Abstract:
As large language models (LLMs) become more common in educational tools and programming environments, questions arise about how these systems should interact with users. This study investigates how different interaction styles with ChatGPT-4o (passive, proactive, and collaborative) affect user performance on simple programming tasks. I conducted a within-subjects experiment where fifteen high school students participated, completing three problems under three distinct versions of the model. Each version was designed to represent a specific style of AI support: responding only when asked, offering suggestions automatically, or engaging the user in back-and-forth dialogue.Quantitative analysis revealed that the collaborative interaction style significantly improved task completion time compared to the passive and proactive conditions. Participants also reported higher satisfaction and perceived helpfulness when working with the collaborative version. These findings suggest that the way an LLM communicates, how it guides, prompts, and responds, can meaningfully impact learning and performance. This research highlights the importance of designing LLMs that go beyond functional correctness to support more interactive, adaptive, and user-centered experiences, especially for novice programmers.
Authors:Benjamin Kahl
Abstract:
This paper investigates the viability of Wave Field Synthesis (WFS) for enhancing auditory immersion in VR-based cognitive research. While Virtual Reality (VR) offers significant advantages for studying human perception and behavior, auditory cues are often underutilized. WFS, an advanced audio rendering technique, can create highly realistic and spatially accurate soundscapes, potentially increasing ecological validity. This study evaluates WFS by implementing a sample experiment where participants localize static and moving sound sources in both a WFS-rendered environment and a conventional stereo headphone setup. The research explores the impact of virtual environments, sound types, and durations on localization accuracy and search behavior. Findings indicate that while stereo setups can achieve higher accuracy, WFS provides a more natural and intuitive auditory experience, particularly for directional cues. The study also highlights limitations of current WFS systems, such as the lack of height localization, occlusion simulation, and user-dependent optimization, which affect performance, especially for centrally located sound sources. Despite these challenges, WFS shows promise for specialized auditory perception research, particularly for complex soundscapes where directional information is paramount.
Authors:Thanh Hoang-Minh
Abstract:
Along with the explosion of large language models, improvements in speech synthesis, advancements in hardware, and the evolution of computer graphics, the current bottleneck in creating digital humans lies in generating character movements that correspond naturally to text or speech inputs.
In this work, we present DeepGesture, a diffusion-based gesture synthesis framework for generating expressive co-speech gestures conditioned on multimodal signals - text, speech, emotion, and seed motion. Built upon the DiffuseStyleGesture model, DeepGesture introduces novel architectural enhancements that improve semantic alignment and emotional expressiveness in generated gestures. Specifically, we integrate fast text transcriptions as semantic conditioning and implement emotion-guided classifier-free diffusion to support controllable gesture generation across affective states. To visualize results, we implement a full rendering pipeline in Unity based on BVH output from the model. Evaluation on the ZeroEGGS dataset shows that DeepGesture produces gestures with improved human-likeness and contextual appropriateness. Our system supports interpolation between emotional states and demonstrates generalization to out-of-distribution speech, including synthetic voices - marking a step forward toward fully multimodal, emotionally aware digital humans.
Project page: https://deepgesture.github.io
Authors:Ben Kereopa-Yorke
Abstract:
This paper examines how distinct cultures of AI interdisciplinarity emerge through interface design, revealing the formation of new disciplinary cultures at these intersections. Through the Interface-Mediated Cognitive Security (IMCS) framework, I demonstrate how the collision of cybersecurity engineering, cognitive psychology, critical technology studies, and human-computer interaction generates research cultures that transcend traditional disciplinary boundaries. AI interfaces function as transformative boundary objects that necessitate methodological fusion rather than mere collaboration, simultaneously embodying technical architectures, psychological design patterns, and social interaction models. Through systematic visual analysis of generative AI platforms and case studies across public sector, medical, and educational domains, I identify four vulnerability vectors, Reflection Simulation, Authority Modulation, Cognitive Load Exploitation, and Market-Security Tension, that structure interface-mediated cognitive security. This research challenges three significant gaps in interdisciplinary theory: the assumption that disciplines maintain distinct methodological boundaries during collaboration, the belief that technical and social knowledge practices can be cleanly separated, and the presumption that disciplinary integration occurs through formal rather than cultural mechanisms. The empirical evidence demonstrates how interfaces function as sites of epistemological collision, creating methodological pressure zones where traditional disciplinary approaches prove insufficient for analysing the complex socio-technical phenomena at the interface.
Authors:Zoe Pfister
Abstract:
Adaptive Cyber-Physical Systems (CPS) are systems that integrate both physical and computational capabilities, which can adjust in response to changing parameters. Furthermore, they increasingly incorporate human-machine collaboration, allowing them to benefit from the individual strengths of humans and machines. Human-Machine Teaming (HMT) represents the most advanced paradigm of human-machine collaboration, envisioning seamless teamwork between humans and machines. However, achieving effective and seamless HMT in adaptive CPS is challenging. While adaptive CPS already benefit from feedback loops such as MAPE-K, there is still a gap in integrating humans into these feedback loops due to different operational cadences of humans and machines. Further, HMT requires constant monitoring of human operators, collecting potentially sensitive information about their actions and behavior. Respecting the privacy and human values of the actors of the CPS is crucial for the success of human-machine teams. This research addresses these challenges by: (1) developing novel methods and processes for integrating HMT into adaptive CPS, focusing on human-machine interaction principles and their incorporation into adaptive feedback loops found in CPS, and (2) creating frameworks for integrating, verifying, and validating ethics and human values throughout the system lifecycle, starting from requirements engineering.
Authors:Russell Beale
Abstract:
Generative AI tools - most notably large language models (LLMs) like ChatGPT and Codex - are rapidly revolutionizing computer science education. These tools can generate, debug, and explain code, thereby transforming the landscape of programming instruction. This paper examines the profound opportunities that AI offers for enhancing computer science education in general, from coding assistance to fostering innovative pedagogical practices and streamlining assessments. At the same time, it highlights challenges including academic integrity concerns, the risk of over-reliance on AI, and difficulties in verifying originality. We discuss what computer science educators should teach in the AI era, how to best integrate these technologies into curricula, and the best practices for assessing student learning in an environment where AI can generate code, prototypes and user feedback. Finally, we propose a set of policy recommendations designed to harness the potential of generative AI while preserving the integrity and rigour of computer science education. Empirical data and emerging studies are used throughout to support our arguments.
Authors:Russell Beale
Abstract:
Large language Models have only been widely available since 2022 and yet in less than three years have had a significant impact on approaches to education and educational technology. Here we review the domains in which they have been used, and discuss a variety of use cases, their successes and failures. We then progress to discussing how this is changing the dynamic for learners and educators, consider the main design challenges facing LLMs if they are to become truly helpful and effective as educational systems, and reflect on the learning paradigms they support. We make clear that the new interaction paradigms they bring are significant and argue that this approach will become so ubiquitous it will become the default way in which we interact with technologies, and revolutionise what people expect from computer systems in general. This leads us to present some specific and significant considerations for the design of educational technology in the future that are likely to be needed to ensure acceptance by the changing expectations of learners and users.
Authors:Yuxuan Yang
Abstract:
The integration of machine learning (ML) into spatial design holds immense potential for optimizing space utilization, enhancing functionality, and streamlining design processes. ML can automate tasks, predict performance outcomes, and tailor spaces to user preferences. However, the emotional, cultural, and aesthetic dimensions of design remain crucial for creating spaces that truly resonate with users-elements that ML alone cannot address. The key challenge lies in harmonizing data-driven efficiency with the nuanced, subjective aspects of design. This paper proposes a human-machine collaboration framework to bridge this gap. An effective framework should recognize that while ML enhances design efficiency through automation and prediction, it must be paired with human creativity to ensure spaces are emotionally engaging and culturally relevant. Human designers contribute intuition, empathy, and cultural insight, guiding ML-generated solutions to align with users' emotional and cultural needs. Additionally, we explore how various ML models can be integrated with human-centered design principles. These models can automate design generation and optimization, while human designers refine the outputs to ensure emotional resonance and aesthetic appeal. Through case studies in office and residential design, we illustrate how this framework fosters both creativity and cultural relevance. By merging ML with human creativity, spatial design can achieve a balance of efficiency and emotional impact, resulting in environments that are both functional and deeply human.